Job details

Member of Technical Staff, Inference

We are building AI to simulate the world through merging art and science.

We believe that world models are at the frontier of progress in artificial intelligence. Language models alone won’t solve the world’s hardest problems – robotics, disease, scientific discovery. Real progress requires models that experience the world and learn from their mistakes, the same way that humans do. And this kind of trial and error can be massively accelerated when done in simulation, rather than in the real world.

World models offer the most clear path to general-purpose simulation, changing how stories are told, how scientific progress is made and how the next frontiers of humanity are reached.

Our team consists of creative, open minded, caring and ambitious people who are determined to change the world. We aspire to continuously build impossible things and our ability to do so relies on building an incredible team. If you are driven to do the same, we'd love to hear from you.

About the role

We're looking for an ML infrastructure engineer to bridge the gap between research and production at Runway. You'll work directly with our research teams to productionize cutting-edge generative models—taking checkpoints from training to staging to production, ensuring reliability at scale, and building the infrastructure that enables fast iteration.

You'll be embedded within research teams, providing platform support throughout the entire model development lifecycle. Your work will directly impact how quickly we can ship new models and features to millions of users.

A peek at our technical stack

Our API endpoints for real-time collaboration and media asset management is written in TypeScript, and runs in ECS containers on AWS Fargate. We leverage multiple AWS-native components, such as S3, CloudFront, Lambda, Kinesis, and SQS, as building blocks of our infrastructure.

Our inference backend is written in Python (PyTorch, TorchScript), and is deployed across multiple clusters / cloud providers. We use Kubernetes for container orchestration, and k8s-native components such as Flyte, Kueue, and Kyverno efficient job orchestration. We invest in prometheus and grafana for monitoring, and Terraform to manage our infrastructure.

What you’ll do

Productionize model checkpoints end-to-end: from research completion to internal testing to production deployment to post-release support
Build and optimize inference systems for large-scale generative models running on multi-GPU environments
Design and implement model serving infrastructure specialized for diffusion models and real-time diffusion workflows
Add monitoring and observability for new model releases—track errors, throughput, GPU utilization, and latency
Embed with research teams to gather training data, run preprocessing scripts, and support the model development process
Explore and integrate with GPU inference providers (Modal, E2E, Baseten, etc.)

What you’ll need

4+ years of experience running ML model inference at scale in production environments
Strong experience with PyTorch and multi-GPU inference for large models
Experience with Kubernetes for ML workloads—deploying, scaling, and debugging GPU-based services
Comfortable working across multiple cloud providers and managing GPU driver compatibility
Experience with monitoring and observability for ML systems (errors, throughput, GPU utilization)
Self-starter who can work embedded with research teams and move fast
Strong systems thinking and pragmatic approach to production reliability
Humility and open mindedness; at Runway we love to learn from one another

Nice to Have

Experience building custom inference frameworks or serving systems
Deep understanding of distributed training and inference patterns (FSDP, data parallelism, tensor parallelism)
Ability to debug low-level issues: NCCL networking problems, CUDA errors, memory leaks, performance bottlenecks
Experience with diffusion models or video generation systems
Knowledge of real-time or latency-sensitive ML applications

Runway strives to recruit and retain exceptional talent from diverse backgrounds while ensuring pay equity for our team. Our salary ranges are based on competitive market rates for our size, stage and industry, and salary is just one part of the overall compensation package we provide.

There are many factors that go into salary determinations, including relevant experience, skill level and qualifications assessed during the interview process, and maintaining internal equity with peers on the team. The range shared below is a general expectation for the function as posted, but we are also open to considering candidates who may be more or less experienced than outlined in the job description. In this case, we will communicate any updates in the expected salary range.

Lastly, the provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range, which again, will be communicated to candidates.

Salary range: $240,000-290,000

Lastly, the provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range, which again, will be communicated to candidates.

ML Infrastructure Inference PyTorch Multi-GPU Kubernetes Terraform AWS Prometheus Grafana Diffusion TorchScript NCCL CUDA FSDP Model serving Generative models Video generation

Runway Glassdoor Company Review

5.0

Runway DE&I Review

No rating

CEO of Runway

Cristóbal Valenzuela

Approve of CEO

Average salary estimate

$265000 / YEARLY (est.)

min

max

$240000K

$290000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Creative Director

Runway Hybrid No location specified

VIEW

Posted 24 hours ago

Inclusive & Diverse

Collaboration over Competition

Fast-Paced

Growth & Learning

Lead Runway's in-house creative studio as Creative Director, setting art and copy direction, owning brand and product campaigns, and mentoring a cross-discipline creative team across North America.

Staff Software Engineer, Builder Tools

Temporal Technologies Hybrid United States, Remote Opportunity

VIEW

Posted 17 hours ago

Temporal is hiring a Staff Software Engineer to lead the architecture and operation of internal builder tools and AI-driven agent platforms that improve developer flow and reliability across the organization.

Platform Engineer - Tech Incubation

Wellmark, Inc. Hybrid Des Moines, IA, USA

VIEW

Posted 11 hours ago

Wellmark is hiring a seasoned Platform Engineer to design, build, and scale agentic AI platforms and infrastructure that enable autonomous, enterprise-grade AI workflows.

Software Engineer/Developer

Jobgether Hybrid US

VIEW

Posted 10 hours ago

Experienced software engineer needed to develop and prototype NLP and LLM-based solutions that extract, structure, and automate aviation data for national airspace modernization.

Senior Embedded Firmware Engineer

K2 Space Hybrid United States - Remote

VIEW

Posted 10 hours ago

K2 Space is hiring a Senior Embedded Firmware Engineer to design, implement, and validate low-level firmware and bring-up for custom high-performance SoCs used in next-generation satellites.

Distinguished AI Tech Lead, Frontier Applications

AVEVA Hybrid San Leandro, California, United States of America

VIEW

Posted 17 hours ago

AVEVA is hiring a Distinguished AI Tech Lead to shape and operationalize frontier AI capabilities across industrial products, bridging advanced research and production delivery.

Software Engineer

thomsonreuters Hybrid United States of America, Eagan, Minnesota

VIEW

Posted 5 hours ago

Experienced software engineer needed to build and maintain cloud-based, customer-facing legal software using Java, JavaScript frameworks (e.g., Angular), and AWS in a hybrid Agile team environment.

Software Engineer II

SEIC Hybrid USA - MA - Boston

VIEW

Posted 18 hours ago

SEI is hiring a Full Stack Software Engineer II to build cloud-native investment systems using .NET, React, TypeScript and AWS in a microservices architecture.

Distinguished Software Engineer, Systems Infrastructure - Core Infra

LinkedIn Hybrid Mountain View, CA

VIEW

Posted 7 hours ago

Senior technical leader sought to shape LinkedIn’s core infrastructure strategy and lead cross-team initiatives across networking, storage, and messaging at massive scale.

Software Engineer I - Backend

CDW Hybrid Virtual - Illinois

VIEW

Posted 5 hours ago

CDW is hiring a remote Software Engineer I (Backend) to build and maintain Flask-based REST and GraphQL APIs on AWS while ensuring quality, performance, and secure production operations.

Software Engineer, GTM

Parloa Hybrid Remotely in the USA

VIEW

Posted 18 hours ago

Design and build AI‑enabled internal systems and integrations to scale Parloa’s Go‑To‑Market operations using TypeScript, Python, and modern AI tooling.

Senior Frontend Software Engineer

ActiveCampaign Hybrid Chicago

VIEW

Posted 10 hours ago

As a Senior Frontend Software Engineer on ActiveCampaign's DUX team, you will drive frontend architecture, build scalable design-system components, and improve the developer and user experience across a micro-frontend platform.

Senior Software Engineer

NBCUniversal Hybrid 30 Rockefeller Plaza, New York, NEW YORK

VIEW

Posted 15 hours ago

Senior Software Engineer needed to develop scalable, LLM-powered agentic systems and cloud-native backends for NBCUniversal's AI initiatives.

Senior Software Engineer (Fullstack, AI Platform)

WHOOP Hybrid Boston, MA

VIEW

Posted 10 hours ago

WHOOP is hiring a Senior Fullstack Software Engineer to develop scalable AI platform features and seamless member experiences from frontend interfaces to backend APIs.

Runway

Runway is an American company headquartered in New York City specializing in generative artificial intelligence research and tech. The company is focuses on creating products and models for generating videos, images, and multimedia content.

18 jobs

MATCH

Calculating your matching score...

BADGES