Senior Data Engineer – ADMET

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Background in computational chemistry, cheminformatics, computational biology, bioinformatics, data engineering, or computer science., Deep experience in pharma/biotech ADMET data pipelines for machine learning., Understanding of assay protocols and their mapping for ADMET data., Experience with federated learning or secure model training is a plus..

Key responsabilities:

  • Design, build, and maintain scalable data pipelines for ADMET datasets.
  • Standardize heterogeneous ADMET data formats for modeling readiness.
  • Implement validation checks to ensure data integrity.
  • Collaborate with cross-functional teams to align data and models with real-world drug discovery needs.

Apheris logo
Apheris Computer Hardware & Networking Startup https://www.apheris.com/
11 - 50 Employees
See all jobs

Job description

About the role
At Apheris, we power federated data networks in life sciences to address the data bottleneck in training highly performant ML models. Publicly available, molecular datasets are insufficient to train high-quality ML models that meet industry requirements. Our product addresses this by hosting networks where biopharma organizations collaboratively train higher quality models on their combined data. The Apheris product is a set of drug discovery applications - enriched with the proprietary data of network participants. Our federated computing infrastructure with built-in governance and privacy controls ensure that the data IP and ownership always stays with the data custodians.

As we are doubling down on ADMET (absorption, distribution, metabolism, excretion, and toxicity) use cases as a focus area within our drug discovery work, we are looking for a Senior Data Engineer to help us build great ADMET models. This is a hands-on, high-impact role focused on advancing the state of the art in applying foundational models to drug discovery problems. You’ll work closely with our ADMET team and will serve as the technical authority on data preparation, data harmonization, and data pipelines in this domain.  

You should bring deep expertise in data infrastructure and data preparation with domain knowledge in pharmacokinetics and toxicity with a focus on ADMET modelling and related tasks. You must also understand the application of these models within industrial drug discovery workflows.

If you want to be part of a mission-driven team building cutting-edge AI systems for life sciences – and you know what it takes to leverage domain-specific data – this role is for you. 
What you will do
  • ADMET Data Pipeline Development: Design, build and maintain scalable pipelines for ingesting, processing, and harmonizing diverse ADMET datasets from public sources (e.g., ChEMBL, PubChem) and proprietary assays.
  • Data Harmonization: Standardize heterogeneous ADMET data formats (e.g., in vitro assays, in silico predictions) across network participants to enable modelling readiness of the data
  • Model-Ready Dataset Curation: Preprocess raw ADMET data (e.g., normalizing units, handling missing values) to support AI/ML model training for a variety of endpoints (like bioavailability, hERG inhibition, or CYP450 interactions)
  • Data Quality Assurance: Implement and automate validation checks to ensure ADMET data integrity
  • Cross-Functional Integration: Work with computational chemists to optimize data structures for AI-driven ADMET models (e.g., graph-based representations for metabolic pathways)
  • Work with our customerand potentially academic partners to define data preprocessing, selection, and benchmarking strategies for novel training tasks involving ADMET data, including leveraging and harmonizing assay data from different sources.
  • Collaborate cross-functionally to ensure data and resulting models address real-world drug discovery needs.
  • Mentor and guide team members on a content level, supporting the planning and breakdown of complex ADMET data preparation.
  • Influence strategic decisions on data infrastructure and data quality assurance
  • Contribute to publications or open-source contributions where relevant.

What we expect from you 
  • By month 3: Develop a deep technical understanding of the Apheris product and how it maps to the current ADMET use-cases we are working on. Take ownership of an ADMET data preparation stream. Build relationships with product and engineering leadership. Develop a roadmap and experiment plan for preparing data and adapting models to one high-value use case.
  • By month 12: Lead multiple data preparation efforts in ADMET and demonstrate measurable progress in model performance and real-world impact. Mentor colleagues and set strategic direction for the domain.
You should apply if
  • You have a background in computational chemistrycheminformatics, computational biology, bioinformatics, data engineering or computer science, and a track record of preparing data for ML models addressing real-world drug discovery problems.
  • You have deep experience in pharma/biotech ADMET data pipelines for machine learning.
  • You have deep experience in ADMET data, including an understanding of assay protocols and how to map protocols to each other.
  • You’re comfortable navigating complex technical landscapes and can break down and drive execution on ambitious modeling plans.
  • You understand how ADMET data and models are used in the drug discovery lifecycle and can align your work to practical use cases.
Bonus points if
  • You have experience in federated learning, privacy-preserving ML, or secure model training.
  • You have experience in benchmarking predictive models against standardized datasets.
  • You have experience working with ML and MLOps systems at scale, including CI/CD, model versioning, Docker, Kubernetes, cloud platforms, and orchestration tools.
  • You’ve contributed to open-source data or cheminformatics tooling.
  • You have hands-on experience working with ADMET assays and DMPK stakeholders.
  • You have experience guiding technical direction in a fast-paced, research-oriented environment.
What we offer you
  • Industry-competitive compensation, incl. early-stage virtual share options
  • Remote-first working – work where you work best, whether from home or a co-working space near you
  • Great suite of benefitsincluding a wellbeing budget, mental health benefits, a work-from-home budget, a co-working stipend and a learning and development budget
  • Regular team lunches and social events
  • Generous holiday allowance
  • Quarterly All Hands meet-up at our Berlin HQ or a different European location
  • A fun, diverse team of mission-driven individuals with a drive to see AI and ML used for good
  • Plenty of room to grow personally and professionally and shape your own role
About Apheris
Apheris powers federated life sciences data networks, addressing the critical challenge of accessing proprietary data locked in silos due to IP and privacy concerns. Publicly available datasets are insufficient to train high-quality ML models that meet industry requirements. Our product addresses this by enabling life sciences organizations to collaboratively train higher quality models on complementary data from multiple parties. We are now doubling down on two key areas of interest: structural biology and ADMET. 
Logistics
Our interview process is split into three phases:
  1. Initial Screening: If your application matches our requirements, we invite you to an initial video call to explore the fit. In this 30-45 minutes interview, you will get to know us and the role. The interviewer will be interested in your relevant experiences and skills, as well as answer any question on the company and the role itself that you may have.
  2. Deep Dive: In this phase, a domain expert from our team will assess your skills and knowledge required for the role by asking you about meaningful experiences or your solutions for specific scenarios in line with the role we are staffing.
  3. Final Interview: Finally, we invite you for up to three hours of targeted sessions with our founders, talking about our culture and meeting future co-workers on the ground.

Required profile

Experience

Industry :
Computer Hardware & Networking
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Mentorship
  • Collaboration
  • Problem Solving

Data Engineer Related jobs