Browse 55 exciting jobs hiring in Ml Observability now. Check out companies hiring such as Airbnb, Freshworks, WHOOP in Aurora, Washington, Lexington-Fayette.
Lead Airbnb's Infrastructure Performance strategy and build the profiling, instrumentation, and organizational practices that make performance a default property of how the company ships software.
Freshworks is hiring a Senior Director of Engineering in San Mateo to lead and scale the ITOM engineering organization focused on AIOps, observability, and high-scale cloud-native platforms.
WHOOP is hiring a Senior Software Engineer on the AI Platform team to design and scale backend services and developer tools that enable Generative AI across the product portfolio.
Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.
Lead development of the intelligence layer for Lumbra's Nebula platform by building agent orchestration, retrieval, and memory systems that enable reliable, auditable AI agents for high-stakes analysis.
Senior Director of Engineering needed to define and drive an AI-first engineering strategy and operational excellence for a remote, global product organization.
Lead a global engineering organization to embed AI-driven tooling and operational excellence into product development and delivery.
RadiantGraph is hiring a Senior Full-Stack Software Engineer to deliver production-grade Python and TypeScript features that power AI-driven healthcare engagement across backend services and frontend applications.
Upstart is hiring a full-stack Software Engineer for the Borrower Experience team to design and implement scalable web, mobile, and AI-powered borrower-facing features that reduce friction and improve payment outcomes.
Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.
Lead Zapier's AI Platform team to build reusable model-serving, evaluation, and MLOps tooling that helps product teams ship AI features quickly, safely, and cost-effectively.
Hammerhead is hiring a Site Reliability Engineer to establish and run the reliability function for an AI-driven power orchestration platform deployed across cloud and on-prem data centers.
Lead the architecture and delivery of AI-native systems at Employ, turning LLMs, agents, and RAG pipelines into secure, scalable product capabilities.
Lead the development of scalable core infrastructure and microservices at ServiceNow to improve performance, observability, and reliability for ML and platform teams.
Gridmatic is hiring a Senior Software Engineer on the Platform team to build reliable GCP/GKE-based infrastructure that supports forecasting, ML training, and real-time battery operations.
Couchbase seeks a Principal Software Engineer to architect and implement the Capella Control Plane using Golang, multi-cloud infrastructure, and AI/ML integrations to scale our DBaaS offering.
Lead DeepWalk’s computer vision platform as a Staff Software Engineer, driving the architecture and productionization of ML systems that process millions of images for sidewalk inspection and city infrastructure decisions.
Zoox seeks a Software Engineer on the Platform Stability team to design and deliver production-grade AI-driven diagnostic tools that improve reliability and observability for autonomous systems.
3M's CRDP team is hiring a Backend Software Engineer to build and operate production-grade backend services and APIs that support data, analytics, and AI/ML use cases at the Maplewood, MN campus.
Expedia Group is looking for a Backend Software Development Engineer II to design and operate scalable Loyalty platform services that enable member accounts, rewards, and personalization.
Lead the strategy and delivery of Aledade's AI Data Platform — shaping data ingestion, MLOps, and retrieval systems that enable LLM-powered clinical tools across the organization.
Future is looking for an Applied AI Engineer to design, validate, and deploy production-grade LLM agents and tooling that enhance personalized coaching across our platform.
Help invent and build agentic AI systems, ML pipelines, and scalable infra as a founding technical team member at TierZero in San Francisco.
Lead and grow cross-functional data engineering teams at Overstory to deliver reliable, ML-enabled data platforms that help utilities reduce wildfire risk and strengthen grid resilience.
Help build and operate the backend systems and ML-serving infrastructure that power Mirage's AI-native video products from our Union Square HQ.
Help build TierZero's core product as a founding engineer focused on LLM-driven agents, scalable infra, and observability for production systems.
Mirage is hiring early-career software engineers in NYC to work onsite across backend, product (web/iOS), and applied AI teams building the next generation of generative video tools.
Deepgram is hiring an ML Ops Infrastructure Engineer to design and operate scalable model deployment, CI/CD, and monitoring systems that deliver production-grade voice AI at scale.
Edison Scientific seeks a Principal Platform Engineer to own the infrastructure foundation for thousands of persistent AI agent workloads, architecting and operating Kubernetes-based clusters and custom controllers at scale.
Lead architecture and implementation of Edison Scientific’s backend-heavy platform that powers AI agents for scientific discovery as a Principal Full-Stack Engineer based in our San Francisco Dogpatch office.
Lead and scale Handshake’s Forward Deployed Engineering team to deliver high-impact, customer-facing AI integrations and build the operating foundations that turn bespoke work into repeatable platform improvements.
Lead the design and implementation of Slate's unified AI backend platform to make model integrations reliable, cost‑efficient, and production-ready at scale.
Lead the design and scaling of LLM and deep-learning solutions for enterprise customer experience at Zendesk, working hybrid from San Francisco or Austin to deliver production ML systems and drive business impact.
Lead OnRamp's engineering transformation into an AI-first, agent-driven organization—owning architecture, team design, and delivery for an enterprise-grade onboarding platform.
Senior full-stack engineer needed to build enterprise-grade systems that enable AI agents to initiate, execute, and complete finance workflows across complex environments.
Forward Deployed Engineer at Appen responsible for designing, deploying, and owning GenAI data solutions and validation workflows for enterprise clients to ensure production-ready data quality.
Help define and build TierZero’s core agentic AI and production infrastructure as a founding engineer working closely with the CTO, CEO, and early customers.
Lead Zania's engineering organization as a hands-on Head of Engineering to scale the team from ~5 to 20+ and build production-grade agentic AI infrastructure for enterprise GRC.
A backend-focused engineer role on Spotify’s OASIS team to build and scale systems that deliver promotion scores and integrate with personalization and ML workflows.
Fullscript is looking for a Staff Machine Learning Engineer to architect and ship production LLM-driven clinical features that improve clinician workflows and patient outcomes.
Concentrate is hiring a hands-on Forward Deployed AI Engineer to combine customer-facing problem solving with engineering work to improve multi-provider LLM routing, reliability, observability, and cost efficiency.
NBCUniversal is hiring an AI Engineer to architect and deliver production-ready LLM and agentic solutions that improve editorial workflows and unlock new newsroom capabilities.
Help design and ship TierZero’s core agentic AI and infrastructure features as a founding engineer working closely with leadership and customers to productionize LLM-driven observability.
Lead the roadmap and execution of Aledade's AI Platform to scale ML systems, infrastructure, and tooling that power internal AI products and improve care for millions of patients.
Help build the ML platform powering enterprise agentic automation by owning production AI features end-to-end at Sola’s NYC headquarters.
TierZero is looking for a founding engineer to design and ship agentic LLM-driven infrastructure and observability systems, working closely with leadership and customers to build the core product.
Experienced engineering leader needed to lead Plaid's NEA organization—setting technical strategy, scaling teams, and driving an AI-first transformation across integration, automation, and data products.
Samsara is hiring a Staff ML Engineer to lead architecture and delivery of their ML platform—spanning training, experimentation, inference, and edge deployment—to drive scalable, real-world safety outcomes.
Help architect and build core agentic AI systems, ML infrastructure, and full-stack features at an early-stage, Series A company focused on production-grade AI observability and engineering workflow automation.
Lead the creation and execution of AECOM’s enterprise AI Operations practice to operationalize observability, governance, and production readiness for AI and agentic systems at scale.