Airflow Orchestration

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Extensive experience in Apache Airflow and Oozie for workflow orchestration., Proficiency in Python for writing Airflow DAGs and custom operators., Solid understanding of AWS services and experience with CI/CD tools like Jenkins and Terraform., Familiarity with Databricks and containerization tools such as Docker and Kubernetes..

Key responsabilities:

  • Lead the migration of workflows from Oozie to Airflow, ensuring no data loss.
  • Design and implement optimized Airflow pipelines for data ingestion and transformation.
  • Collaborate with data engineering teams to migrate workloads to Databricks on AWS.
  • Set up monitoring and error-handling mechanisms for Airflow workflows.

Brighttier logo
Brighttier Scaleup https://brighttier.com/
11 - 50 Employees
See all jobs

Job description

We are seeking a skilled Airflow Orchestration and Ingestion Engineer to lead the migration of workflow orchestration from Apache Oozie to Apache Airflow as part of our Cloudera Hadoop migration to Databricks on AWS. The ideal candidate will have extensive experience in workflow automation, CI/CD pipelines data pipeline orchestration, and migrating large-scale data platforms to modern cloud-based solutions.

Key Responsibilities:

Workflow Migration:

  • Analyze and convert Oozie workflows into Airflow DAGs using Python-based orchestration.
  • Design and implement reusable, modular, and optimized Airflow pipelines for data ingestion, transformation, and orchestration.
  • Maintain a one-to-one mapping between legacy workflows and Airflow DAGs, ensuring no data loss or business interruption.

Cloud Data Migration:

  • Collaborate with data engineering teams to migrate Cloudera Hadoop workloads to Databricks on AWS.
  • Leverage Airflow for scheduling and orchestrating data workflows on AWS-based services (e.g., S3, EMR, Glue, Redshift).

 Pipeline Optimization:

  • Optimize data ingestion pipelines to achieve high throughput and low latency on AWS cloud infrastructure.
  •  Integrate Airflow workflows with Databricks for data transformations and analytics.

Error Handling and Monitoring:

  • Implement robust error-handling mechanisms, task retries, and alerting within Airflow workflows.
  • Set up monitoring dashboards using tools like CloudWatch, Prometheus, or Airflow’s built-in features.

Collaboration and Documentation:

  • Work closely with data architects, cloud engineers, and DevOps teams to align on migration goals and architecture.
  • Document the migration process, workflow logic, and best practices for ongoing maintenance.

Performance Testing:

  • Conduct performance testing and benchmarking of migrated pipelines to ensure efficient resource utilization on Databricks and AWS.

CI/CD Implementation:

  •  Design and maintain CI/CD pipelines for orchestrated workflows using Jenkins and Terraform.
  •  Automate deployment of Airflow DAGs and infrastructure components using Terraform IaC (Infrastructure as Code).
  •  Implement quality checks and validation for workflow pipelines during the deployment process.

Technical Expertise:

  • Strong experience in Apache Airflow, including designing and managing complex DAGs.
  • Hands-on experience with Apache Oozie and migration to modern orchestration tools.
  • Proficiency in Python for writing Airflow DAGs and custom operators/hooks.
  • Experience with Hadoop ecosystems (Cloudera distribution preferred) and their components like HDFS, Hive, and Spar
  • Solid experience with CI/CD tools, including Jenkins for pipeline automation and Terraform for data orchestration/pipeline provisioning.
  •  Familiarity with Databricks (on AWS preferred) and its integration with Airflow for ETL and data processing.

Cloud and Infrastructure:

  • Solid understanding of AWS services such as S3, EMR, Glue, Lambda, Redshift, and IAM.
  • Experience with containerization tools (Docker, Kubernetes) and CI/CD pipelines for workflow deployment.

Analytical and Problem-Solving:

  • Ability to debug and resolve issues in data workflows and orchestrators.
  • Experience optimizing workflows for performance and scalability.

Preferred Qualifications:

  •   Experience in large-scale cloud migrations, specifically from on-premises Hadoop to Databricks on AWS
  •   Knowledge of Spark and PySpark for big data transformations.
  •   Familiarity with version control tools (e.g., Git) and workflow monitoring tools.
  •   Certifications in AWS (e.g., AWS Certified Solutions Architect) or Databricks.
  •  Selected applicant will be subject to a background investigation, which will be conducted and the results of which will be used in compliance with applicable law.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Thinking
  • Collaboration
  • Problem Solving

Related jobs