Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Model Serving Engineer image - Rise Careers
Job details

Model Serving Engineer

About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

About the role

We are looking for a Model Serving Engineer to own the production inference layer for NEXUS, our Large Tabular Model. You will be responsible for serving models reliably and efficiently at scale, working primarily with Triton Inference Server and building the infrastructure that brings our research directly to customers. This is a deeply technical, Python-heavy role that sits at the intersection of systems engineering and applied ML.

You will work closely with our research and engineering teams to translate model outputs into production-grade inference pipelines that meet strict latency and throughput requirements.


Key responsibilities

  • Design, build, and maintain production model serving infrastructure using Triton Inference Server as the primary framework

  • Implement and optimize inference pipelines including custom backends, dynamic batching strategies, and model ensemble configurations in Triton

  • Optimize Python inference code for performance, with a strong focus on GIL contention, multi-threading, and concurrency patterns

  • Tune throughput and latency across the full serving stack, batching policies, thread pool sizing, model instance groups, and memory layout

  • Work closely with the research team to understand new model architectures at a computational level, batching behavior, dynamic shapes, memory access patterns etc

  • Own the full resource observability and control loop for production inference - instrument GPU memory, CPU, batch queue depth, and latency metrics, and actively tune model instance groups, concurrency limits, memory budgets, and batching configuration in response to observed behavior

  • Evaluate and integrate alternative inference frameworks and runtimes as the model ecosystem evolves

  • Contribute to GPU utilization improvements and resource efficiency across the serving fleet

Must have

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)

  • 5+ years of experience in model serving, ML infrastructure, or a closely related backend engineering role

  • Deep, production-level experience with Triton Inference Server, including custom Python backends, batching configuration, and model repository management

  • Expert-level Python skills with a thorough understanding of the GIL, multi-threading, multiprocessing, and async concurrency patterns

  • Strong understanding of neural network inference mechanics, forward passes, batching strategies, memory management, and numerical precision tradeoffs

  • Hands-on experience with other inference frameworks (TorchServe, TensorFlow Serving, ONNX Runtime, vLLM, etc.) and the ability to evaluate tradeoffs between them

  • Experience profiling and optimizing inference code for latency and throughput at production scale

Nice to have

  • Experience with GPU kernel-level optimizations or CUDA profiling tools

  • Familiarity with model quantization, pruning, or compilation toolchains (TensorRT, torch.compile, ONNX)

  • Experience with KServe or other Kubernetes-native serving platforms

  • Experience serving tabular or structured data models, including classical ML models such as XGBoost and CatBoost

  • Experience with observability tooling such as Prometheus, Grafana, or Datadog in the context of inference monitoring

Benefits

  • Competitive compensation with salary and equity

  • Comprehensive health coverage, including medical, dental, vision, and 401K

  • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys

  • Relocation support for employees moving to join the team in one of our office locations

  • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Average salary estimate

$200000 / YEARLY (est.)
min
max
$160000K
$240000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 16 hours ago

YouVersion Labs seeks a Senior Engineer to rapidly prototype and validate innovative web, mobile, and AI-driven experiences that expand Bible engagement worldwide.

thomsonreuters Hybrid United States of America, Eagan, Minnesota
Posted 2 hours ago

Experienced software engineer needed to build and maintain cloud-based, customer-facing legal software using Java, JavaScript frameworks (e.g., Angular), and AWS in a hybrid Agile team environment.

Photo of the Rise User
Posted 8 hours ago

Wellmark is hiring a seasoned Platform Engineer to design, build, and scale agentic AI platforms and infrastructure that enable autonomous, enterprise-grade AI workflows.

Photo of the Rise User

Experienced software engineer needed to develop and prototype NLP and LLM-based solutions that extract, structure, and automate aviation data for national airspace modernization.

Photo of the Rise User
Posted 11 hours ago

Senior Director of Engineering needed to drive AI-powered engineering practices and operational excellence across global development teams in a remote role based in Pennsylvania.

Photo of the Rise User
Posted 20 hours ago

CapTech is hiring a senior Full-Stack Developer (.NET) in Salt Lake City to deliver cloud-ready, API-driven enterprise applications and integrations across front-end and back-end stacks.

Photo of the Rise User
Posted 19 hours ago

Lead design and implementation of manufacturing software and diagnostics to assure kinematic performance and safety for next-generation surgical robotic instruments at a market-leading medical robotics company.

Photo of the Rise User
Posted 14 hours ago

Experienced platform engineer needed to lead and scale Signifyd's GCP/Kubernetes cloud platform, building self-service tooling, AI-driven automation, and robust observability for a global commerce product.

pubGENIUS Hybrid No location specified
Posted 16 hours ago

A senior, hands-on Principal Software Engineer is needed to own architecture, performance, and delivery across a high-revenue web platform, mobile app, and ML-driven ad-tech systems for a remote-first ad-tech agency/startup.

Photo of the Rise User

Work remotely on cloud infrastructure and data systems that power large-scale AI-driven automation for a mission-focused company transforming global waste systems.

Photo of the Rise User
Pinterest Hybrid San Francisco, CA, US; Palo Alto, CA, US
Posted 2 hours ago

Lead cross-team engineering to build scalable catalog, integration, and AI-native merchant systems that improve onboarding, catalog health, and merchant growth at Pinterest.

Photo of the Rise User

Lead design and development of secure, high-availability APIs and enterprise integrations for San Francisco’s JUSTIS criminal justice data exchange as the Principal System Integration Engineer.

Photo of the Rise User
PayPal Hybrid San Jose, California, United States of America
Posted 17 hours ago

Experienced backend-focused Staff Software Engineer needed to lead architecture and delivery of scalable Node.js/React services for PayPal's commerce platform.

At Fundamental, when one person cares for another, it’s more than a profession. It’s a relationship. One based on experience as well as empathy. It is a blending of high-tech with high-touch. It is a relationship nourished by understanding. We see...

2 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 3, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!