Senior Software Engineer, NIM Production

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience., 6+ years of experience in developing performant microservices, cloud software, and/or tooling roles., Deep technical expertise in distributed containerized applications using Docker, Kubernetes, and Helm Charts., Strong programming skills and experience with multi-functional teams..

Key responsibilities:

  • Design, build, and optimize containerized inference execution for LLM applications.
  • Ensure performance and scalability of NIMs through performance measurement and optimization.
  • Collaborate with software engineers, researchers, and product management to improve inference solutions and APIs.
  • Mentor team members and foster growth within the team.

NVIDIA logo
NVIDIA XLarge http://www.nvidia.com
10001 Employees
See all jobs

Job description

NVIDIA is the platform upon which every new AI-powered application is built. We are seeking a Senior Software Engineer to develop components that are used by the software factory automation for NVIDIA Inference Microservices (NIMs) and its deployed services. The right person for this role brings technical drive and creativity to change the way NVIDIA provides high-performance inferencing for every AI model. Our NIM offerings are easy to use, optimized for performance, and developed using a highly automated software factory. We create containers available for download and hosted services.

You will apply your expertise to develop highly available services that make effective use of the thousands of GPU involved in this operation. Your services provide the best-in-class performance, accuracy and availability. We are looking for technical talent to design, build, operate and improve our capabilities to produce NIMs at scale, including the underlying infrastructure, pipelines, inference backends, Docker build, test harness, metrics, performance engineering, log ingestion, and more.

What you'll be doing:

  • Design, build, and optimize containerized inference execution for LLM applications, ensuring efficiency and scalability. These applications may run in container orchestration platforms like Kubernetes to enable scalable and robust deployment.

  • Ensure the performance and scalability of NIMs through comprehensive performance measurement and optimization.

  • Apply container expertise to create and optimize the basic building blocks of NIMs, influencing the development of many models and related products within NVIDIA.

  • Collaborate, brainstorm, and improve the designs of inference solutions and APIs with a broad team of software engineers, researchers, SREs, and product management.

  • Mentor and collaborate with team members and other teams to foster growth and development. Demonstrate a history of learning and enhancing both personal skills and those of colleagues.

What we need to see:

  • A history of using advanced programming skills to build distributed compute systems, backend services, microservices, and cloud technologies.

  • Experience productionizing and deploying LLM models.

  • Effective experience working with multi-functional teams, principals, and architects across organizational boundaries.

  • Mentorship and the ability to grow teams and team members.

  • Deep technical expertise in distributed containerized applications using Docker, Kubernetes, Helm Charts.

  • Passion for building scalable and performant microservice applications.

  • Excellent interpersonal skills and the flexibility to lead multi-functional efforts.

  • Proven experience debugging and analyzing the performance of distributed microservices or cloud systems.

  • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience.

  • 6+ years of demonstrated experience in developing performant microservices, cloud software, and/or tooling roles.

Ways to stand out from the crowd:

  • Experience with open-source inference engines and serving stacks.

  • Experience benchmarking the speed and accuracy of generative AI models.

  • Prior experience in building and deploying containers for microservices, cloud, and on-prem deployments, along with their associated CI/CD pipelines.

  • Previous work in large-scale backend development.

We are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and creative people in the world working for us. If you're creative and autonomous with a real passion for technology, we want to hear from you.

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#deeplearning

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Mentorship
  • Social Skills

Software Engineer Related jobs