Browse 61 exciting jobs hiring in Inference now. Check out companies hiring such as FriendliAI, Spotify, webAI in Huntington Beach, Cape Coral, Atlanta.
Develop and productionize agent systems and the Friendli Agent API at FriendliAI to enable developers to build reliable, high-impact AI agent applications.
Drive product decisions for Spotify Premium as a Data Scientist focused on experimentation, AI-enabled analytics, and insights that increase conversion and retention.
Senior Machine Learning Engineer needed to transform prototype AI models into optimized, production-ready systems for secure, distributed public sector and edge deployments.
Lead performance and scalability improvements for LLM inference by optimizing runtime components, multi-GPU execution, and open-source serving frameworks at scale.
Contribute to state-of-the-art robot learning and on-robot deployment at a fast-moving consumer robotics startup focused on dexterous home manipulation.
Aviator Health seeks a Technical Ex‑Founder to lead 0→1 consumer product development and build autonomous agent systems that navigate real healthcare workflows from our NYC office.
Drive production-ready model optimization, custom kernel development, and edge deployment to enable real-time inference of large-scale models on vehicle SOCs for Zoox's Perception team.
Multi Media LLC is hiring a Senior Data Scientist to lead rigorous statistical analyses and measurement efforts that drive product and business decisions for a high-traffic live streaming platform.
Lead system- and hardware-focused optimizations for LinkedIn’s AI inference platform, improving GPU utilization, compiler workflows, and low-latency model serving at scale.
Lead DeepWalk’s computer vision platform as a Staff Software Engineer, driving the architecture and productionization of ML systems that process millions of images for sidewalk inspection and city infrastructure decisions.
Lead a small analytics team to drive causal, hypothesis-driven investigations into network reliability and subscriber experience for a major communications client while producing executive-ready insights and recommendations.
pureIntegration is hiring a Mid-Level Data Analyst to analyze large-scale datasets, produce dashboards and reports, and deliver actionable insights to improve network reliability and subscriber experience on a remote contract.
Lead cutting-edge research on multimodal foundation models and efficient GenAI at Bosch Research Pittsburgh, translating innovations into industrial and product impact while publishing at top-tier venues.
Lead the design and delivery of a closed-loop intelligence layer that enables an autonomous trading fleet to learn from real-time outcomes and improve profitability.
Help scale production ML infrastructure and retrieval systems at Foxglove to enable high-performance semantic search and data mining over multimodal robotics data.
Twelve Labs is hiring a senior Machine Learning Engineer to optimize and scale multimodal video foundation models for deployment across cloud and data platforms.
Solace is seeking a hands-on Marketing Analytics Manager to build and own attribution, incrementality testing, and measurement infrastructure that drives data-informed growth decisions for a fast-scaling healthcare startup.
Deepgram is hiring an ML Ops Infrastructure Engineer to design and operate scalable model deployment, CI/CD, and monitoring systems that deliver production-grade voice AI at scale.
Lead the design and deployment of low-latency, production ML systems for voice, audio, and agentic control at an early-stage hardware and software startup in New York City.
Tavus is hiring a Multimodal AI Model Optimization Research Engineer to convert cutting-edge multimodal models into efficient, low-latency production systems.
Work with research teams to productionize large-scale generative models, build GPU inference infrastructure, and ensure reliable deployment and observability for production ML workloads.
Work across modeling, systems, and product to design, optimize, and ship production-grade AI systems for real-world users.
A Research Engineer role focused on GPU/kernel and distributed-training optimizations to scale and accelerate real-time world-model AI.
Lead and build True Anomaly’s AI platform and engineering team to deliver production-grade model hosting, agent infrastructure, and enterprise AI tooling that embed AI across the company.
Lead the development of custom quantization algorithms and low-precision techniques to maximize model performance on Quadric's Chimera GPNPU from our Burlingame engineering office.
Drive the design and implementation of experimentation methodologies, inference pipelines, and production tooling as a Full‑Stack Data Scientist on Netflix’s Experimentation Platform.
Fundamental is hiring a Model Serving Engineer to build and optimize production inference infrastructure for NEXUS, focusing on Triton-based pipelines, GPU efficiency, and low-latency, high-throughput serving.
Lead Blackbird’s analytics layer to translate product and customer data into strategic decisions that accelerate growth and retention.
Pluralsight seeks an experienced Data Scientist to design, validate, and deploy machine learning and NLP solutions that drive product and business impact.
Triumph is hiring a Data Scientist to build pricing, risk, and behavior models that drive monetization and retention for a high-growth real-money gaming platform.
Dentsu is hiring a VP of Data Science to lead and productize advanced measurement science (MMM, RBA, Bayesian methods) and scale a distributed team to deliver client-facing analytics products.
Lead ML-driven improvements to ad auction performance by building scalable models, running experiments, and partnering with engineering and product teams at a fast-paced ad tech organization.
Develop and optimize high-performance C++ AI and computer-vision software for embedded camera systems used in mission-critical public safety and security applications at Motorola Solutions.
Lead the design and productionization of mission-critical NLP and LLM-powered features at Laurel, shaping the AI platform that returns time to professional services firms.
Lead the Core GenerativeAgent team to design, build, and deploy low-latency, enterprise-grade conversational voice AI combining LLMs with speech-to-text, text-to-speech, and real-time streaming pipelines.
Lead product strategy and discovery for Kamiwaza’s on-prem enterprise AI orchestration platform, turning customer problems into coherent, outcome-driven releases.
Amazon Security seeks a Senior Security Engineer to lead offensive operations and research against AI systems, scaling automated threat emulation across the AI portfolio.
Shape and own the QA strategy for FriendliAI’s inference platform, covering backend, frontend, model deployments, and novel validation for LLM inference quality.
Senior-level embedded AI engineer role at Renesas to lead development of model translation tooling and high-performance inference for resource-constrained MCUs/MPUs.
Senior technical role focused on researching, engineering, and scaling privacy-preserving ML and LLM alignment solutions across LinkedIn's platforms.
Decagon is hiring a Senior ML Infrastructure Engineer to design and scale distributed training and multi-provider inference platforms for LLMs and multimodal models.
Work on FriendliAI's core developer experience by owning the Python SDK and CLI, packaging pipelines, and internal dev tools that enable reliable integrations with our inference and agent platform.
Metamorphic is hiring an ML Research Engineer (Performance Engineering) to implement and optimize GPU kernels, low-precision training, and MoE systems for next-generation foundation models.
Lead a high-performing data science team to build and govern next-generation portfolio management and loss mitigation models for a regulated, consumer-focused fintech.
Work on training and deploying large-scale ML systems for physical robots while building the infrastructure and pipelines to operate them in production.
Wizard AI is hiring a Senior MLOps Engineer to own and scale the production ML lifecycle for a real-time inference platform behind a conversational shopping agent.
Andromeda Cluster is hiring an Infrastructure Manager to scale global GPU compute supply and demand matching by sourcing suppliers, optimizing utilization, and negotiating commercial terms.
NVIDIA seeks a seasoned Developer Relations Manager to partner with hyperscaler AI teams, provide hands-on technical enablement for NVIDIA AI software, and drive developer adoption and feedback into the product roadmap.
Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.
Drive adoption of NVIDIA accelerated computing by advising AI-native startups on architecture, optimization, and scaling of agentic, multimodal, and LLM-powered applications.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
26
|