Data Engineer (Databricks + Python + Azure)
RemoteFull-timePosted 3 months ago
Design, build, and optimize scalable data pipelines in Azure using Databricks and Python to support analytics and governance in healthcare.
Responsibilities
- Design, develop, and maintain scalable data pipelines with Databricks PySpark and Python.
- Build and optimize ETL/ELT processes in Azure.
- Implement data models based on Data Lakehouse (Medallion) architecture.
- Ensure data quality and performance across ingestion, staging, and curated layers.
- Collaborate with architects, analysts, and stakeholders to translate healthcare data needs.
- Develop reusable data transformations and modular processing components.
- Support deployment using CI/CD and DevOps practices.
- Monitor and optimize data workflows for performance, scalability, and reliability.
- Contribute to data governance, security, and compliance in healthcare environments.
Requirements
- Experience with Databricks, data architecture, integrations, data warehousing, ETL/ELT.
- Experience developing/deploying custom Python wheels or notebook scripts for distributed execution.
- Proficiency in SQL, stored procedures, and PySpark.
- Strong knowledge of cloud/hybrid RDBMS (SQL Server, PostgreSQL, Oracle, Azure SQL).
- Experience with batch and streaming processing techniques and file compaction.
Nice to Have
- Strong hands-on Databricks experience in Azure.
- Advanced proficiency in Python and PySpark.
- Experience building pipelines in Azure (ADF, Azure SQL, ADLS).
- Understanding of data warehousing, lakehouse concepts, and ETL/ELT.
- Experience with SQL Server, PostgreSQL, Oracle, or similar.
- Knowledge of batch and streaming processing patterns.
- Experience with large, complex datasets in cloud distributed environments.
- Strong analytical and problem-solving skills.
- Ability to work in cross-functional and distributed teams.
- Clear communication with non-technical stakeholders.
Apply Now
Take the next step in your career
Links
Careers
