About the role

Senior Data Engineer developing and optimizing data pipelines on Databricks for healthtech company. Collaborating with stakeholders and ensuring data integrity through efficient processing.

Responsibilities

Develop, manage, and optimize data pipelines on the Databricks platform.
Debug and troubleshoot Spark applications to ensure reliability and performance.
Implement best practices for Spark compute and optimize workloads.
Write clean, efficient, and reusable Python code using object-oriented programming principles.
Design and build APIs to support data integration and application needs.
Develop scripts and tools to automate data processing and workflows.
Integrate, query, and manage data within MongoDB.
Ensure efficient storage and retrieval processes tailored to application requirements.
Optimize MongoDB performance for large-scale data handling.
Work closely with data scientists, analysts, and other stakeholders to understand data needs and deliver solutions.
Proactively identify and address technical challenges related to data processing and system design.

Proven experience working with Databricks and Spark compute.
Proficient in Python, including object-oriented programming and API development.
Familiarity with NoSQL (MongoDB preferred), including querying, data modeling, and optimization.
Strong problem-solving skills and ability to debug and optimize data processing tasks.
Experience with large-scale data processing and distributed systems.