Job details

High Performance Computing Software Engineer - Supercomputing

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large-scale AI workloads.

In this role, you’ll work at the intersection of high-performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting-edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer-friendly.

Job Responsibilities

Design and implement high-performance, distributed software solutions for large-scale AI/ML training.
Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.
Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.
Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
Debug and resolve complex issues across the stack—from kernel to container to model.
Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.

Skills & Experience

Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).
Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.
Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
Excellent debugging and systems performance tuning skills.
A collaborative mindset with a focus on shared success and technical excellence.

$150,000 - $300,000 a year

Benefits Include

*Comprehensive medical, dental, and vision benefits

*Bonus

*401K Plan

*Generous paid time off, sick leave and holidays

*Paid Parental Leave

*Employee Assistance Program

*Life insurance and disability

HPC GPU NCCL MPI UCX RDMA Slurm Kubernetes PyTorch DeepSpeed MegatronLM Linux kernel C++ Python HPC Software Engineer Systems engineer GPU drivers

Average salary estimate

$225000 / YEARLY (est.)

min

max

$150000K

$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Engineering Manager

Calendly Hybrid Remote

VIEW

Posted 17 hours ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Take Risks

Startup Mindset

Collaboration over Competition

Fast-Paced

Growth & Learning

Dental Insurance

Vision Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Performance Bonus

Family Medical Leave

Paid Holidays

A product-minded Engineering Manager is needed to lead and grow engineering teams, drive technical execution for distributed, service-oriented systems, and partner cross-functionally to deliver impactful scheduling products.

Staff Software Engineer, Builder Tools

Temporal Technologies Hybrid United States, Remote Opportunity

VIEW

Posted 16 hours ago

Temporal is hiring a Staff Software Engineer to lead the architecture and operation of internal builder tools and AI-driven agent platforms that improve developer flow and reliability across the organization.

Distinguished AI Tech Lead, Initiatives

AVEVA Hybrid San Leandro, California, United States of America

VIEW

Posted 16 hours ago

Lead and architect enterprise-scale AI initiatives at AVEVA, translating cutting-edge AI research into production-ready architectures, repeatable patterns, and cross-functional delivery across industrial domains.

Full Stack Engineer

Syngenta Group Hybrid Slater, Iowa, United States

VIEW

Posted 20 hours ago

Syngenta Seeds is hiring a Full-Stack Engineer to build scalable web applications that translate AI/ML capabilities into intuitive tools for growers and global users.

Software Engineer I - Backend

CDW Hybrid Virtual - Illinois

VIEW

Posted 4 hours ago

CDW is hiring a remote Software Engineer I (Backend) to build and maintain Flask-based REST and GraphQL APIs on AWS while ensuring quality, performance, and secure production operations.

Distinguished Software Engineer, Systems Infrastructure - Core Infra

LinkedIn Hybrid Mountain View, CA

VIEW

Posted 5 hours ago

Senior technical leader sought to shape LinkedIn’s core infrastructure strategy and lead cross-team initiatives across networking, storage, and messaging at massive scale.

Senior Software Engineer/Tech Lead (Full-Stack)

GoodLeap Hybrid Remote, US

VIEW

Posted 10 hours ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

GoodLeap is hiring a Senior Full-Stack Software Engineer/Tech Lead to drive frontend-focused, full-stack initiatives and build scalable, AI-enabled finance platform features while mentoring teammates.

Software Engineer, GTM

Parloa Hybrid Remotely in the USA

VIEW

Posted 17 hours ago

Design and build AI‑enabled internal systems and integrations to scale Parloa’s Go‑To‑Market operations using TypeScript, Python, and modern AI tooling.

Principal CAD Developer

CoLab Software Hybrid North America, Remote

VIEW

Posted 16 hours ago

Senior product-minded engineer needed to prototype, architect, and de-risk browser-based 2D/3D CAD and engineering-data systems for a remote-first AI platform used by major OEMs.

MuleSoft Integration Developer

CSCI Consulting Hybrid Remote

VIEW

Posted 16 hours ago

CSCI Consulting is seeking an experienced MuleSoft Integration Developer to design and implement secure, high-performance integrations and API-led connectivity for a major Federal modernization program.

Principal, Software Development Engineer

Workday Hybrid USA, CA, Pleasanton

VIEW

Posted 19 hours ago

Workday is hiring a Principal Software Engineer to own and evolve AI-native infrastructure tooling and automation across large-scale, distributed platform environments.

Sr. Staff Software Engineer, Merchants

Pinterest Hybrid San Francisco, CA, US; Palo Alto, CA, US

VIEW

Posted 4 hours ago

Lead cross-team engineering to build scalable catalog, integration, and AI-native merchant systems that improve onboarding, catalog health, and merchant growth at Pinterest.

Software Development Engineer- Product Reliability Engineering

Visa Hybrid Austin, TX, USA

VIEW

Posted 15 hours ago

Make infrastructure resilient and scalable at Visa by building automation, database reliability tooling, and GenAI-powered engineering assistants on the Product Reliability Engineering team in Austin.

I Institute of Foundation Models

1 jobs

MATCH

Calculating your matching score...

FUNDING

Private

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info