Data Engineer - Databricks

Indeed

Full-time

Onsite

No experience limit

No degree limit

Mexico

Favourites

Some content was automatically translatedView Original

Description

At Derevo, we empower businesses and individuals by unlocking the value of data within organizations. With over 15 years of experience, we design end-to-end data and AI solutions: from integration into modern architectures to the implementation of intelligent models in key business processes. **We're looking for your talent: Sr Data Engineer (Databricks)!! ✋** **What will be your mission?** You will play a key role in creating and implementing high-quality modern data architectures, driving analytical solutions based on Big Data technologies. You will design, maintain, and optimize parallel processing systems, applying best practices for storage and management in data warehouses, data lakes, and lakehouses. You will be the passionate professional who collects, processes, cleans, and orchestrates large volumes of data, understanding structured and semi-structured models to effectively integrate and transform multiple sources. You will define the optimal strategy according to business objectives and technical requirements, turning complex problems into achievable solutions that help our clients make data-driven decisions. **‍ How will you do it?** * You will join the project, its sprints, and execute development activities while always applying our best data practices and implemented technologies. * You will identify requirements and define scope, participating in sprint planning and engineering sessions with a consulting mindset that adds extra value. * You will proactively collaborate in workshops and meetings with both internal teams and clients. * You will classify and estimate tasks using agile methodologies (epics, features, technical/user stories) and provide daily follow-up to maintain sprint pace. * You will meet committed delivery deadlines and manage risks by timely communication of deviations. **✅ What benefits will you have?** * WELLNESS: We will support your holistic well-being through personal, professional, and financial balance. Our legal and additional benefits will help you achieve this. * LET'S RELEASE YOUR POWER: You'll have the opportunity to comprehensively specialize in different areas and technologies, achieving interdisciplinary growth. We’ll encourage you to set new challenges and surpass yourself. * WE CREATE NEW THINGS: We like to think outside the box. You’ll have the space, trust, and freedom to create, along with the necessary training to succeed. * WE GROW TOGETHER: You’ll participate in cutting-edge technological projects, multinational initiatives, and work with international teams. **Where will you do it?** We are a large team operating under a remote model, flexible yet structured; we provide the necessary equipment and internal communication tools to facilitate our operations and those of our clients. **What do we ask for?** **To successfully join and thrive as a Data Engineer at Derevo, these are the qualifications we will consider:** * Intermediate/advanced level of English (technical and business conversations, B2+ or C1) **Experience in:** Query and Programming Languages: T-SQL / Spark SQL: * DDL and DML, intermediate and advanced queries (subqueries, CTEs, multiple joins with business rules), grouping and aggregation (GROUP BY, window functions, business metrics), stored procedures for ETL/ELT, index optimization, statistics and execution plans for massive processes Python (PySpark): * Object-oriented programming (classes, modules), managing data structures and types (variables, lists, tuples, dictionaries), flow control using conditionals and loops, ingestion of structured and semi-structured data, development of DataFrames and UDFs, time windows and partitioning for optimization, coding best practices (PEP8, modularity) Databricks: * Apache Spark & DataFrame API: Design pipelines leveraging the DataFrame API for massive transformations; use declarative functions and vectorized expressions. * Delta Lake: Management of Delta tables with ACID transactions, time travel for auditing and partition pruning for efficient reads within the medallion architecture. * Autoloader & Data Ingestion: Configuration of incremental ingestion into OneLake or ADLS Gen2 using Auto Loader, capture of schema changes (schema evolution), and checkpointing to ensure exactly-once delivery without additional code. * Structured Streaming: Orchestration of real-time data streams using event-time and processing-time triggers, watermarking, and stateful operations for low latency and fault tolerance. * Delta Live Tables (DLT): Declaration of ETL/ELT pipelines in SQL or Python with integrated data quality (Expectations), automatic dependency management, and continuous monitoring. * Performance Optimization: Caching techniques, broadcast joins, shuffle optimizations, and use of columnar formats (Parquet/Delta) with Z-Ordering and OPTIMIZE to reduce processing times. * Lakehouse Federation: Unified querying across external sources via Unity Catalog. * Jobs & Workflows: Creation of multi-stage pipelines with dependencies, automatic retries, scheduling or data arrival triggers; integration with Azure Data Factory when needed. * Repos & CI/CD: Version control of notebooks and scripts in GitHub/Azure DevOps, configuration of validation pipelines (unit and schema tests), and automated deployment across dev-test-prod environments. * Monitoring and Observability: Alerts via workflow job notifications for events such as failures, and generation of proactive automated alerts. If you meet most of the requirements and are interested in the role, don't hesitate to apply—our Talent team will contact you! Become derevian & develop your superpower!

Source: indeed View original post