Moonvalley is developing cutting-edge generative AI models designed to power Superbowl-worthy commercials and award-winning cinematic experiences. Our inaugural, cutting-edge HD model, Marey, is built on exclusively licensed and owned data for professional use in Hollywood and enterprise applications.
Our team is an unprecedented convergence of talent across industries. Our elite AI scientists from Deepmind, Google, Microsoft, Meta & Snap, have decades of collective experience in machine learning and computational creativity. We have also established the first AI-enabled movie studio in Hollywood, filled with accomplished filmmakers and visionary creative talent. We work with the top producers, actors, and filmmakers in Hollywood as well as creative-driven global brands. So far we've raised over $70M from world-class investors including General Catalyst, Bessemer, Khosla Ventures & YCombinator – and we're just getting started.
Role Summary:
We’re looking for an Infrastructure Engineer to shape the backbone of our AI systems as we develop cutting-edge AI models. Joining at an early stage you'll have the unique opportunity to architect infrastructure at scale, harness thousands of GPUs, tackle challenging data problems, work with top AI talent, and push the boundaries of large-scale model capabilities. We’re looking for people who love dealing with technical complexity, thrive in an innovative and fast-paced environment, and want to shape the future of AI.
What you’ll do:
Manage, and scale GPU infrastructure (Kubernetes, Terraform / Pulumi).
Maintain ETL pipelines (Spark / Ray).
Oversee the telemetry platform to monitor system health (Datadog, Grafana, W&B).
Manage the code platform (GitHub, CI/CD, PyTorch, Python).
Track and optimize assets like datasets, checkpoints, and compute resources.
Develop tools, documentation, and guidance for the team.
Challenges you'll tackle:
Build robust high-performance distributed training of large-scale transformer models across clusters of 1000-5000 GPUs
Implement high-performance, multi-modal data pipelines capable of processing petabyte-scale datasets within hours
Continuously evolving our infrastructure to stay ahead of cutting-edge AI advancements
Scaling our infrastructure to handle the next order of magnitude in growth
What we’re looking for:
Passion for building petabyte-scale systems that enhance efficiency and productivity.
Ability to balance quick fixes for urgent needs with long-term, scalable solutions.
Strong prioritization skills in a fast-moving, high-impact environment.
Comfortable using open-source tools or developing custom solutions when needed.
A versatile generalist, eager to learn and adapt to new tools and systems.
Nice to haves:
Experience with infrastructure for large-scale AI training.
Cluster Engineering: GPU infrastructure, Kubernetes expertise.
Data Engineering: Mastery of ETL pipelines.
Developer Advocacy: Improving workflows, documentation, and tool adoption.
#LI-remote
In our team, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we openly communicate this expectation.
If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.
All business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or North America (LA, Toronto) as a company.
If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look forward to hearing from you!
The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and skill typically required and the scope of responsibility. It should not be considered an all-inclusive listing of work requirements. Individuals may perform other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work
Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.
Please be assured we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.
Aristo Sourcing
Blend
Whizz
Cyclotron, Inc.
Theoria Medical