Match score not available

Senior Data Engineer

extra holidays - fully flexible

Remote:

Full Remote

Experience:

Junior (1-2 years)

Work from:

New York (USA), United States

Offer summary

Qualifications:

3+ years in data engineering roles, Advanced proficiency in Python and SQL, 1+ years of experience using Databricks, Strong hands-on experience with Apache Spark.

Key responsabilities:

Collaborate on data engineering strategy
Develop, test, and maintain data pipelines

Credible Human Resources, Staffing & Recruiting Startup http://credible-app.com/

2 - 10 Employees

See all jobs

Job description

Strategy Creation: Collaborate with cross-functional teams to define the data engineering strategy aligned to business objectives, including data modeling that unifies data assets across a range of source systems used to manage the operations of our partnering hospitals.
Pipeline Development: Define and execute processes needed to develop, test, deploy, and maintain high quality data pipelines. Oversee the end-to-end development of data pipelines from source data extraction through to production-grade analytical dataset delivery, ensuring data quality and security throughout the pipeline.
Performance Optimization: Continuously monitor and optimize data processing performance and efficiency. Identify and address bottlenecks, optimize query performance, and improve overall system stability.
Data Governance: Establish and enforce data quality management policies, data access controls, and data privacy standards.
Technical Leadership: Stay abreast of the latest developments in engineering tools and best practices. Provide guidance to the team about technical challenges.
Documentation: Maintain clear and comprehensive documentation of data pipelines, architecture, and processes to ensure knowledge sharing and team continuity.
Third-party Management: Evaluate and manage relationships with third-party vendors and tools, making informed decisions about when to leverage external solutions.
3+ years in data engineering roles in a production environment
Advanced proficiency in Python and SQL for data engineering
Up-to-date knowledge of and 1+ years of experience using Databricks for Lakehouse management
Deep understanding of data modeling, data architecture, and data integration best practices
Strong hands-on experience with Apache Spark
Familiarity with data governance, security, and privacy principles
Comfort using git or equivalent to manage the software development life cycle
Exceptional ability to learn and use new software development techniques and tools
Ability to manage multiple projects simultaneously
High energy, humble team player with “get it done” attitude, seeking collaboration with colleagues

Preferred Qualifications

Experience with the Azure cloud ecosystem
Experience developing production-ready, real-time machine learning model serving pipelines
Comfort developing in the Apache Spark Structured Streaming paradigm
Experience working in a private equity-backed services company
Experience deploying machine learning models with MLFlow or equivalent
Experience developing CI/CD pipelines