Senior Data Engineer (Databricks, Python)
- Back-End
- Bulgaria, Hungary, Lithuania, Poland, Romania
- 3000 - 5000 USD
- Senior
- Full-Time
- Remote
XY DIGITAL is hiring a Senior Data Engineer (Databricks, Python) to join a transformative data management project for a global healthcare technology leader. You will design and maintain robust data pipelines, ensuring the accuracy and efficiency of data ingestion and transformation. This is a high-impact role, perfect for experienced data engineers who excel in cloud-based data systems, especially in a Databricks environment.
Step 1: HR Screening
Step 2: Technical Interview (Live Coding or Task)
Step 3: Final Evaluation with Engineering Lead
Start Date: Immediate
Contract: Long-term / Open-ended
Design, develop, and maintain scalable data pipelines using Databricks and Azure Databricks
Extract, clean, aggregate, and transform data for analysis and reporting
Implement efficient ETL processes to support large-scale data processing
Monitor data pipelines, troubleshoot issues, and optimize performance
Collaborate with stakeholders to understand data requirements and deliver solutions
Ensure data accuracy and completeness through data validation and quality checks
Document all data engineering processes, workflows, and architecture
Provide mentorship and technical guidance to junior team members
Proven experience in Data Engineering with Databricks
Strong expertise in ETL processes and scalable data pipelines
Proficiency in Spark and Python for large-scale data processing
Experience extracting data from web-based sources (APIs, web scraping)
Proficiency in Azure or other major cloud platforms
Strong knowledge of SQL and RDBMS for data validation and storage
Understanding of data storage, transformation, and optimization techniques
Excellent problem-solving skills and attention to detail
Experience with cloud-based data storage (Azure Data Lake, Blob Storage)
Familiarity with Delta Tables and Parquet for efficient data management
Knowledge of data security, encryption, and access control in cloud environments
Proficiency with data orchestration tools (Apache Airflow, Azure Data Factory)
Understanding of data governance and compliance best practices