Browse 14 exciting jobs hiring in Inference Optimization now. Check out companies hiring such as Zoox, LinkedIn, MLabs in Brownsville, Indianapolis, Newark.
Drive production-ready model optimization, custom kernel development, and edge deployment to enable real-time inference of large-scale models on vehicle SOCs for Zoox's Perception team.
Lead system- and hardware-focused optimizations for LinkedIn’s AI inference platform, improving GPU utilization, compiler workflows, and low-latency model serving at scale.
Lead the design and delivery of a closed-loop intelligence layer that enables an autonomous trading fleet to learn from real-time outcomes and improve profitability.
Twelve Labs is hiring a senior Machine Learning Engineer to optimize and scale multimodal video foundation models for deployment across cloud and data platforms.
Lead the design and deployment of low-latency, production ML systems for voice, audio, and agentic control at an early-stage hardware and software startup in New York City.
Tavus is hiring a Multimodal AI Model Optimization Research Engineer to convert cutting-edge multimodal models into efficient, low-latency production systems.
Work across modeling, systems, and product to design, optimize, and ship production-grade AI systems for real-world users.
Lead the development of custom quantization algorithms and low-precision techniques to maximize model performance on Quadric's Chimera GPNPU from our Burlingame engineering office.
Decagon is hiring a Senior ML Infrastructure Engineer to design and scale distributed training and multi-provider inference platforms for LLMs and multimodal models.
Metamorphic is hiring an ML Research Engineer (Performance Engineering) to implement and optimize GPU kernels, low-precision training, and MoE systems for next-generation foundation models.
Wizard AI is hiring a Senior MLOps Engineer to own and scale the production ML lifecycle for a real-time inference platform behind a conversational shopping agent.
Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.
Lead developer-facing content and sample projects that help ML engineers train, fine-tune, and deploy models on Dexmate humanoid robots while shipping production-quality code weekly.
Lead the design and scaling of distributed GPU infrastructure and production computer vision pipelines to bring state-of-the-art models from research into reliable, low-latency cloud deployments for Claryo's warehouse AI platform.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
5
|