Match score not available

Lead/Architect Data Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

7+ years of experience in data engineering with a focus on cloud-native data architecture., Deep hands-on experience with Databricks architecture and workspace administration., Strong knowledge of Spark performance tuning and job optimization., Proven expertise in Databricks SQL, PySpark, and building reusable Python libraries..

Key responsabilities:

  • Lead the architecture and development of scalable ETL/ELT pipelines using Databricks and PySpark.
  • Collaborate with infrastructure teams to define Databricks architecture strategy and ensure secure implementation.
  • Drive DevOps best practices using Azure DevOps and CI/CD automation pipelines.
  • Mentor junior engineers and perform architectural reviews to ensure alignment with best practices.

APN Consulting Inc. logo
APN Consulting Inc. SME http://www.apnconsultinginc.com
201 - 500 Employees
See all jobs

Job description

APN Consulting, Inc. is a progressive IT staffing and services company offering innovative business solutions to improve client business outcomes. We focus on high impact technology solutions in ServiceNow, Fullstack, Cloud & Data, and AI / ML. Due to our globally expanding service offerings we are seeking top-talent to join our teams and grow with us.

Role: Lead/Architect Data Engineer
Location: Remote (US based-remote anywhere in US)
Duration: Contract
Job Summary
We are looking for a results-driven Lead Data Engineer (Contractor) to architect, develop, and guide the implementation of modern data pipelines and cloud-native analytics solutions. The ideal candidate will lead end-to-end delivery across engineering, analytics, and product teams, bringing deep experience in Databricks, PySpark, and Azure cloud platforms. This role also requires strong hands-on experience in Databricks architecture, administration, and performance optimization .
Key Responsibilities
  1. Lead the architecture, design, and development of scalable ETL/ELT pipelines using Databricks, Pyspark, and SQL across distributed data environments.
  2. Architect and manage Databricks workspaces, including provisioning and maintenance of clusters, cluster policies, and job compute environments in accordance with enterprise standards.
  3. Collaborate with platform and infrastructure teams to define Databricks architecture strategy and ensure secure, scalable, and cost-effective implementation.
  4. Define and enforce cluster policies to ensure proper resource utilization, cost control, and access control based on workload patterns and team requirements.
  5. Lead performance tuning of Spark jobs, Databricks SQL queries, and notebooks, ensuring optimal execution and minimizing latency.
  6. Build modular, reusable Python libraries using Pandas, NumPy, and PySpark for scalable data processing.
  7. Develop optimized Databricks SQL queries and views to powerTableau dashboards
    1. React and .NET-based applications
    2. Ad-hoc and real-time analytics use cases
    3. Work closely with frontend and backend development teams to deliver use-case-specific, query-optimized datasets.
  8. Leverage Unity Catalog for fine-grained access control, data lineage, and metadata governance.
  9. Drive DevOps best practices using Azure DevOps, Terraforms, and CI/CD automation pipelines.
  10. Mentor junior engineers and perform architectural reviews to ensure consistency and alignment with best practices.
Required Skills & Qualifications
  1. 7+ years of experience in data engineering, with a strong background in cloud-native data architecture.
  2. Deep hands-on experience with Databricks architecture, workspace administration, and cluster management.
  3. Experience defining and managing cluster policies, pools, and autoscaling strategies.
  4. Strong knowledge of Spark performance tuning and job optimization.
  5. Proven expertise in Databricks SQL, PySpark, Delta Lake, and large-scale data pipelines.
  6. Skilled in building reusable Python libraries with Pandas, Openpyxl, XLSXWriter, and PySpark.
  7. Practical experience working with Unity Catalog for security and governance.
  8. Strong collaboration experience with front-end/backend development teams and backend integration.
  9. Strong SQL expertise and hands-on experience with PostgreSQL, SQL Server, or similar.
  10. DevOps expertise with tools like Azure DevOps, Git, and pipeline automation.
  11. Excellent communication skills with the ability to lead discussions with cross-functional teams and stakeholders.
Tools & Technologies
  1. Cloud Platforms: Azure (preferred), AWS
  2. Big Data & Analytics: Databricks, PySpark, Delta Lake, Databricks SQL, Spark Connect, Delta Live Tables
  3. Programming & Frameworks: Python, Pandas, PySpark, Flask
  4. Visualization & BI: Tableau
  5. App Integration: React, .NET, REST APIs
  6. DevOps & CI/CD: Azure DevOps, Git
  7. Databases: Databricks SQL, Azure SQL DB, or similar

We are committed to fostering a diverse, inclusive, and equitable workplace where individuals from all backgrounds feel valued and empowered to contribute their unique perspectives. We strongly encourage applications from candidates of all genders, races, ethnicities, abilities, and experiences to join our team and help us build a culture of belonging.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication

Data Engineer Related jobs