Match score not available

Principal ML Engineer (Infra/hardware)

fully flexible

Offer summary

Qualifications:

Proven track record in ML infrastructure optimization, Hands-on experience with AWS Neuron SDK, Deep expertise in PyTorch model optimization, Production experience with Kubernetes and Ray.

Key responsabilities:

  • Lead architecture design for ML infrastructure modernization
  • Implement efficient model compilation pipelines for Inferentia2

Neurons Lab logo
Neurons Lab Research Scaleup https://www.neurons-lab.com/

Job description

About the project

We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.

Current Infrastructure:

  • ML Models: RetinaFace, OpenPose, CLIP, and other CV models

  • Hardware: A10/T4 GPUs on EKS

  • Serving: Triton Inference Server

  • Orchestration: Mix of Kubernetes and Ray

Stage: Presale and Delivery

Duration: 2 months (preliminary)

Capacity: part-time (20h/week)

Areas of Responsibility
  • Technical Leadership:

    • Lead the architecture design for ML infrastructure modernization

    • Define compilation and optimization strategies for model migration

    • Establish performance benchmarking framework

    • Set up monitoring and alerting for the new infrastructure

  • Performance Optimization:

    • Implement efficient model compilation pipelines for Inferentia2

    • Optimize batch processing and memory layouts

    • Fine-tune model serving configurations

    • Ensure latency requirements are met across all services

  • Cost Optimization:

    • Analyze and optimize infrastructure costs

    • Implement efficient resource allocation strategies

    • Set up cost monitoring and reporting

    • Achieve target cost reduction while maintaining performance

Skills
  • Proven track record of ML infrastructure optimization projects

  • Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment

  • Deep expertise in PyTorch model optimization and compilation

  • Experience with high-throughput computer vision model serving

  • Production experience with both Kubernetes and Ray for ML workloads

Knowledge
  1. Model Optimization Expertise:

    • Deep understanding of ML model architecture optimization

    • Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)

    • Proficiency in optimizing computer vision models (CNN architectures)

    • Knowledge of model serving optimization patterns

  2. Performance Optimization:

    • Advanced understanding of ML model inference optimization

    • Expertise in batch processing strategies

    • Memory layout optimization for vision models

    • Experience with pipeline parallelism implementation

    • Proficiency in latency/throughput optimization techniques

  3. Hardware Acceleration:

    • Deep knowledge of ML accelerator architectures

    • Understanding of hardware-specific optimizations

    • Experience with model compilation for specialized chips

    • Proficiency in memory access pattern optimization

  4. Performance Analysis:

    • Proficiency in ML model profiling tools

    • Experience with performance bottleneck identification

    • Knowledge of performance monitoring techniques

    • Ability to analyze and optimize inference patterns

Nice to Have:

  • Experience with Ray architecture for ML serving

  • Knowledge of distributed ML systems

  • Understanding of ML pipeline optimization

  • Experience with model quantization techniques

Experience
  1. Model Optimization (4+ years):

    • Proven track record of optimizing large-scale ML inference systems

    • Successfully implemented hardware-specific model optimizations

    • Demonstrated experience with computer vision model optimization

    • Led projects achieving significant performance improvements

  2. Proven Results (Examples):

  • Successfully optimized computer vision models similar to RetinaFace/CLIP

  • Achieved significant cost reduction while maintaining performance

  • Implemented efficient batch processing strategies

  • Developed performance monitoring and optimization frameworks

Required profile

Experience

Industry :
Research
Spoken language(s):
English
Check out the description to know which languages are mandatory.

ML Ops Engineer Related jobs