3+ years of hands-on experience in production ML infrastructure using Python, Go, or similar languages., Experience with public cloud providers like GCP, AWS, or Azure, preferably GCP., Knowledge of deep learning fundamentals and tools such as Huggingface, Ray, PyTorch, or TensorFlow., Familiarity with agile software processes and modular code design..
Key responsibilities:
Manage and maintain large scale production Kubernetes clusters for ML workloads.
Contribute to the Spotify ML Platform SDK and build tools for ML operations.
Collaborate with Machine Learning Engineers and product teams to deliver scalable ML platform solutions.
Design, document, and implement reliable and maintainable ML infrastructure capabilities.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Our mission is to unlock the potential of human creativity—by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.
Spotify transformed music listening forever when it launched in Sweden in 2008. Discover, manage and share over 70m tracks for free, or upgrade to Spotify Premium to access exclusive features including offline mode, improved sound quality, and an ad-free music listening experience.
Today, Spotify is the most popular global audio streaming service with 365m users, including 165m subscribers across 178 markets. We are the largest driver of revenue to the music business today.
The Hendrix ML Platform team is dedicated to developing a robust, Spotify-wide platform for training and serving machine learning models. This platform streamlines the productionization of AI and ML models by mitigating the incidental complexities involved in creating backend services for serving predictions and training models.
What You'll Do
Manage and maintain large scale production Kubernetes clusters for ML workloads, including ML platform infrastructure and necessary dev ops.
Contribute to Spotify ML Platform SDK and build tools for various ML operations.
Collaborate with Machine Learning Engineers (MLE), researchers, and various product teams to deliver scalable ML platform tooling solutions that meet the timelines and specifications of given requirements.
Work independently and collaboratively on squad projects that often requires learning and applying new technologies that may go beyond existing skillsets.
Designs, documents and implements reliable, testable and maintainable solutions ML infrastructure capabilities.
Who You Are
You have 3+ years of hands-on experience implementing production ML infrastructure at scale in Python, Go or similar languages
3+ years of experience working with a public cloud provider such as GCP, AWS, or Azure. Preferably GCP.
Knowledge of deep learning fundamentals, algorithms, and open-source tools such as Huggingface, Ray, PyTorch or TensorFlow
Good to have an understanding of distributed training leveraging GPUs and Kubernetes
You have a general understanding of data processing for ML
You have experience with agile software processes and modular code design following industry standards
Where You'll Be
This role is based in Toronto.
We offer you the flexibility to work where you work best! There will be some in person meetings, but still allows for flexibility to work from home.
Required profile
Experience
Industry :
Music
Spoken language(s):
English
Check out the description to know which languages are mandatory.