Browse 16 exciting jobs hiring in Gpu Inference now. Check out companies hiring such as LinkedIn, TwelveLabs, Deepgram in Madison, Wichita, Albuquerque.
Lead system- and hardware-focused optimizations for LinkedIn’s AI inference platform, improving GPU utilization, compiler workflows, and low-latency model serving at scale.
Twelve Labs is hiring a senior Machine Learning Engineer to optimize and scale multimodal video foundation models for deployment across cloud and data platforms.
Deepgram is hiring an ML Ops Infrastructure Engineer to design and operate scalable model deployment, CI/CD, and monitoring systems that deliver production-grade voice AI at scale.
Tavus is hiring a Multimodal AI Model Optimization Research Engineer to convert cutting-edge multimodal models into efficient, low-latency production systems.
Work with research teams to productionize large-scale generative models, build GPU inference infrastructure, and ensure reliable deployment and observability for production ML workloads.
A Research Engineer role focused on GPU/kernel and distributed-training optimizations to scale and accelerate real-time world-model AI.
Lead and build True Anomaly’s AI platform and engineering team to deliver production-grade model hosting, agent infrastructure, and enterprise AI tooling that embed AI across the company.
Fundamental is hiring a Model Serving Engineer to build and optimize production inference infrastructure for NEXUS, focusing on Triton-based pipelines, GPU efficiency, and low-latency, high-throughput serving.
Decagon is hiring a Senior ML Infrastructure Engineer to design and scale distributed training and multi-provider inference platforms for LLMs and multimodal models.
Metamorphic is hiring an ML Research Engineer (Performance Engineering) to implement and optimize GPU kernels, low-precision training, and MoE systems for next-generation foundation models.
Work on training and deploying large-scale ML systems for physical robots while building the infrastructure and pipelines to operate them in production.
Wizard AI is hiring a Senior MLOps Engineer to own and scale the production ML lifecycle for a real-time inference platform behind a conversational shopping agent.
Andromeda Cluster is hiring an Infrastructure Manager to scale global GPU compute supply and demand matching by sourcing suppliers, optimizing utilization, and negotiating commercial terms.
NVIDIA seeks a seasoned Developer Relations Manager to partner with hyperscaler AI teams, provide hands-on technical enablement for NVIDIA AI software, and drive developer adoption and feedback into the product roadmap.
Work as an Inference Engine Engineer at FriendliAI to design high-performance GPU kernels and core runtime components that power latency-critical, production-scale generative AI systems.
Field AI seeks an MLOps Engineer to build and operate scalable GPU infrastructure, deployment pipelines, and monitoring for ML models powering real-world robotics.