Offer summary

Qualifications:

Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field., 3+ years of experience as an ETL Developer or Data Engineer with a focus on large-scale data processing., Strong proficiency in SQL and experience with relational and NoSQL databases., Certifications in cloud data engineering or specific data platform certifications are preferred..

Key responsibilities:

Design, develop, and optimize scalable ETL processes for large-scale healthcare datasets.

Construct and maintain data pipelines using scripting languages and distributed computing frameworks.

Collaborate with stakeholders to understand data requirements and translate them into technical specifications.

Implement and manage data warehousing solutions, ensuring data integrity and compliance with governance standards.

Job description

ETL Developer / Data Engineer

The ETL Developer / Data Engineer is a foundational role responsible for building and maintaining the robust data infrastructure that powers all analytical endeavors. This individual will design, construct, and optimize data pipelines to extract, transform, and load large volumes of diverse data from various sources into analytical platforms. A strong command of data warehousing principles, scripting languages, and cloud-based data technologies is essential to ensure data quality, accessibility, and efficient flow for advanced analytics and reporting.

Responsibilities:

Design, develop, and optimize scalable ETL (Extract, Transform, Load) processes for ingesting and preparing large-scale healthcare and related datasets.

Construct and maintain robust data pipelines using scripting languages (e.g., Python, Scala) and distributed computing frameworks (e.g., Apache Spark, Databricks Delta Live Tables).

Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and translate them into technical specifications for data acquisition, transformation, and storage.

Implement and manage data warehousing solutions, including data modeling, schema design, and performance tuning for optimal data retrieval and analytical query execution.

Develop and oversee database systems (e.g., Snowflake, PostgreSQL, Amazon Redshift), ensuring data integrity, security, and compliance with data governance standards.

Automate data refresh processes, implement monitoring tools, and establish alerts to proactively identify and resolve data pipeline issues and anomalies.

Perform data profiling, quality checks, and validation to ensure accuracy, consistency, and completeness of data throughout the pipeline.

Document data flows, ETL processes, data dictionaries, and database schemas thoroughly for knowledge transfer, reproducibility, and auditability.

Continuously evaluate and integrate new data technologies, tools, and methodologies to enhance data infrastructure efficiency, scalability, and capabilities.

Support troubleshooting of data-related issues, perform root cause analysis, and implement corrective actions to maintain data pipeline reliability and performance.

Certifications: Certifications in cloud data engineering (e.g., AWS Certified Data Analytics - Specialty) or specific data platform certifications (e.g., Databricks Certified Data Engineer).

Experience required:
3+ years of hands-on experience as an ETL Developer, Data Engineer, or a similar role, with a focus on large-scale data processing.
Strong proficiency in SQL and experience with various relational and potentially NoSQL databases.
Proven experience in building and optimizing scalable data pipelines using scripting languages (Python preferred) and distributed computing.

Direct experience with cloud-based data platforms (e.g., AWS) and technologies such as Snowflake and Databricks.

Familiarity with data transformation tools like dbt (data build tool) and version control systems (e.g., GitHub).

Key Skills:
ETL Development
Data Pipeline Automation