Data Engineer (Databricks + Python + Azure) – Remote Job at Allata | Find My Remote

Design, build, and optimize scalable data pipelines in Azure using Databricks and Python to support analytics and governance in healthcare.

Design, develop, and maintain scalable data pipelines with Databricks PySpark and Python.
Build and optimize ETL/ELT processes in Azure.
Implement data models based on Data Lakehouse (Medallion) architecture.
Ensure data quality and performance across ingestion, staging, and curated layers.
Collaborate with architects, analysts, and stakeholders to translate healthcare data needs.
Develop reusable data transformations and modular processing components.
Support deployment using CI/CD and DevOps practices.
Monitor and optimize data workflows for performance, scalability, and reliability.
Contribute to data governance, security, and compliance in healthcare environments.

Experience with Databricks, data architecture, integrations, data warehousing, ETL/ELT.
Experience developing/deploying custom Python wheels or notebook scripts for distributed execution.
Proficiency in SQL, stored procedures, and PySpark.
Strong knowledge of cloud/hybrid RDBMS (SQL Server, PostgreSQL, Oracle, Azure SQL).
Experience with batch and streaming processing techniques and file compaction.