Browse 11 exciting jobs hiring in Agent Evaluation now. Check out companies hiring such as Spotify, Cover Whale, Arta Finance in Amarillo, Fort Wayne, Charlotte.
Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.
Lead and build the agentic AI platform that enables pods of engineers and AI agents to safely and reliably deliver production software at scale.
Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.
A selective, eight-week (mostly virtual) unpaid bootcamp at ServiceNow for undergraduate students to learn agentic AI, build and evaluate agents, and present a capstone project during an in-person finale.
Senior engineering leader to design, evaluate and productionize agentic AI systems, prompt architectures and multi-agent orchestration for critical banking workflows at Deutsche Bank in Cary, NC.
Experienced software engineers with strong system-design and ML/LLM experience are needed to build and productionize LLM-powered agents, evaluation pipelines, and scalable AI infrastructure at Permute.
Fullscript is looking for a Staff Machine Learning Engineer to architect and ship production LLM-driven clinical features that improve clinician workflows and patient outcomes.
Work on TRM’s AI Engineering team to design and ship agentic LLM systems and scalable infrastructure that augment investigations and ensure safe, auditable behavior in high-sensitivity environments.
Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.
Lead the engineering of agentic ML systems and AI-native developer tooling at NVIDIA's Cosmos team to accelerate model development through agents, pipelines, and scalable evaluation.
LinkedIn is seeking a Principal Product Manager to drive agentic AI product strategy and execution for enterprise customers, balancing hands-on prototyping with cross-functional leadership.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
2
|