35 Llm Evaluation Jobs Hiring Now (April 2026)

Senior Staff Machine Learning Engineer - Agentic Systems

Spotify Hybrid New York, NY

VIEW

Posted 8 hours ago

Inclusive & Diverse

Empathetic

Take Risks

Transparent & Candid

Feedback Forward

Mission Driven

Collaboration over Competition

Work/Life Harmony

Maternity Leave

Paternity Leave

Snacks

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

401K Matching

Paid Sick Days

Paid Time-Off

Paid Volunteer Time

Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.

L

AI Engineer

LanguageWire Hybrid No location specified

VIEW

Posted yesterday

LanguageWire is hiring an AI Engineer to design and productionize LLM-based translation workflows and bridge ML experimentation with production engineering.

E

AI Engineer

EQL Tech Hybrid No location specified

VIEW

Posted 2 days ago

Work on a mission-driven fintech team to build and ship core AI products (LLM/VLM and evaluation pipelines) that power eligibility and compliance for education savings accounts.

k

AI Product Engineer, Clinical Tools

knownwell Hybrid Remote

VIEW

Posted 4 days ago

Lead the product vision and engineering for clinician-facing AI tools at knownwell, building and operating RAG-based clinical decision support with full product ownership and direct clinician partnership.

Senior AI Technical Product Manager - R01563914

Brillio Hybrid New York, New York, United States

VIEW

Posted 4 days ago

Experienced technical product leader needed to own prioritization, quality, and stakeholder alignment for LLM-driven products while staying hands-on with architecture, code reviews, and AI cost optimization.

Machine Learning Engineer, AI Agent Platform

Arta Finance Hybrid Mountain View

VIEW

Posted 4 days ago

Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.

W

Shape the Future of AI — English Talent Hub

Welo Global Hybrid No location specified

VIEW

Posted 6 days ago

Welo Data is building a flexible, remote contributor network of native English speakers to annotate, evaluate, and create prompts that improve AI systems.

VP, Product (AI & Search) - Slack

Salesforce Hybrid California - San Francisco

VIEW

Posted 6 days ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Paid Time-Off

Maternity Leave

Paternity Leave

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Employee Resource Groups

Lead Slack's search and AI platform as VP Product to set strategy, drive model and infrastructure decisions, and deliver reliable, scalable AI-powered search and knowledge services for enterprise users.

Forward Deployed Engineer

NICE Hybrid USA - Remote

VIEW

Posted 6 days ago

NiCE is hiring a Forward Deployed Engineer to design, ship, and operate production-scale conversational AI agents that solve high-impact enterprise problems.

W

Generalist - English & Hindi

Weekday AI Hybrid No location specified

VIEW

Posted 11 days ago

Contract opportunity to evaluate and improve LLM conversational responses in Hindi and English by performing fact-checking, annotation, and qualitative assessment.

Staff Software Engineer, Applied AI

Valence Hybrid San Francisco

VIEW

Posted 12 days ago

Lead the design and production of LLM-driven coaching systems at Valence, applying deep ML and engineering expertise to build enterprise-grade, context-aware AI experiences.

D

Applied AI & Agent Engineering Lead - Vice President

DB Hybrid Cary, 3000 CentreGreen Way

VIEW

Posted 16 days ago

Senior engineering leader to design, evaluate and productionize agentic AI systems, prompt architectures and multi-agent orchestration for critical banking workflows at Deutsche Bank in Cary, NC.

Data Scientist

Crosby Hybrid New York City

VIEW

Posted 16 days ago

Crosby AI is hiring a Data Scientist to develop NLP/LLM models, evaluation frameworks, and data strategies that power its AI-driven legal platform.

W

Generative AI Data Analyst - USA (Remote)

Welo Global Hybrid United States

VIEW

Posted 17 days ago

Generative AI Analyst at Welocalize to craft prompts, annotate and evaluate LLM outputs, and lead labeling workflows in a remote full-time role.

AI Architect

Cambium Learning Group Hybrid Remote

VIEW

Posted 17 days ago

Lead the design and implementation of secure, scalable Generative AI and ML architectures for an EdTech organization focused on building production-ready RAG, retrieval, and MLOps solutions.

AI Developer Experience Engineer

Crosby Hybrid New York City

VIEW

Posted 17 days ago

Build the internal tooling and evaluation infrastructure that empowers engineers and researchers to iterate quickly and reliably on Crosby’s LLM-powered legal platform.

ML Research Scientist

Handshake Hybrid Remote

VIEW

Posted 18 days ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Sabbatical

Paid Holidays

Handshake is hiring an ML Research Scientist to drive open scientific research, create public benchmarks, and collaborate with top AI labs to advance data and evaluation methods for frontier models.

Lead QA Engineer

Awesome Motive Hybrid New York City

VIEW

Posted 18 days ago

Finny, an AI-first fintech in Chelsea, NYC, is hiring a Lead QA Engineer to build and scale automated testing, CI/CD integration, and LLM evaluation across our Python backend and TypeScript frontend.

M

Senior AI Data Scientist

MLabs Hybrid No location specified

VIEW

Posted 19 days ago

Lead the design and evaluation of agentic LLM systems that power a fintech's financial intelligence platform, ensuring correctness, scalability, and production reliability.

Senior Software Engineer

Awesome Motive Hybrid Chicago

VIEW

Posted 20 days ago

Experienced software engineers with strong system-design and ML/LLM experience are needed to build and productionize LLM-powered agents, evaluation pipelines, and scalable AI infrastructure at Permute.

Staff, Machine Learning Engineer

Fullscript Hybrid No location specified

VIEW

Posted 20 days ago

Fullscript is looking for a Staff Machine Learning Engineer to architect and ship production LLM-driven clinical features that improve clinician workflows and patient outcomes.

Sr. AI Engineer (24 months fixed-term)

Khan Academy Hybrid Remote

VIEW

Posted 21 days ago

Inclusive & Diverse

Diversity of Opinions

Growth & Learning

Mission Driven

Social Impact Driven

Empathetic

Dental Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

Khan Academy is hiring a Senior AI Engineer (24-month fixed-term) to lead integration, evaluation, and quality improvements of generative AI features that support learning at scale.

AI Product Portfolio Director- Marketing

ServiceNow Hybrid 15725 Dallas Pkwy, Addison, TX 75001, USA

VIEW

Posted 22 days ago

Inclusive & Diverse

Mission Driven

Rise from Within

Diversity of Opinions

Work/Life Harmony

Empathetic

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Conferences Stipend

Paid Time-Off

Maternity Leave

Equity

Lead the AI product portfolio for marketing to turn enterprise AI strategy into a cohesive MarTech roadmap, measurable productivity gains, and durable automation at scale.

AI Product Portfolio Director- Martech

ServiceNow Hybrid 275 Wyman St 2nd floor, Waltham, MA 02451, USA

VIEW

Posted 22 days ago

Inclusive & Diverse

Mission Driven

Rise from Within

Diversity of Opinions

Work/Life Harmony

Empathetic

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Conferences Stipend

Paid Time-Off

Maternity Leave

Equity

Lead the AI MarTech product portfolio at ServiceNow to convert AI strategy into scalable agentic workflows, measurable productivity gains, and sustained marketing leverage.

AI Agent Engineer - San Francisco Only

TRM Labs Hybrid San Fracisco

VIEW

Posted 23 days ago

Work on TRM’s AI Engineering team to design and ship agentic LLM systems and scalable infrastructure that augment investigations and ensure safe, auditable behavior in high-sensitivity environments.

V

AI Engineer

Varick Agents Hybrid No location specified

VIEW

Posted 23 days ago

Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.

Principal Software Developer, Applied AI

Savvas Learning Company Hybrid Remote

VIEW

Posted 24 days ago

Lead the design, production deployment, and continual improvement of AI-powered features for Savvas's flagship K-12 platform, applying deep LLM, cloud, and software engineering expertise to improve student learning at scale.

V

AI Product Testing Engineer

Virtue AI Hybrid San Francisco

VIEW

Posted 26 days ago

Virtue AI is seeking a hands-on Testing Engineer to lead product and backend QA, automate system testing, and perform model red-teaming for a cutting-edge AI security platform.

Software Engineer - Model Developer Ecosystem

Baseten Hybrid San Francisco

VIEW

Posted 26 days ago

Help shape Baseten's model ecosystem by combining hands-on engineering, developer education, and product thinking to improve model discovery, evaluation, and adoption.

ML and Agentic Systems Engineer

NVIDIA Hybrid US, CA, Santa Clara

VIEW

Posted 27 days ago

Customer-Centric

Mission Driven

Inclusive & Diverse

Rise from Within

Diversity of Opinions

Work/Life Harmony

Growth & Learning

Transparent & Candid

Medical Insurance

Paid Time-Off

Maternity Leave

Mental Health Resources

Equity

Child Care stipend

Paternity Leave

WFH Reimbursements

Flex-Friendly

Dental Insurance

Vision Insurance

Life insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Military leave

Lead the engineering of agentic ML systems and AI-native developer tooling at NVIDIA's Cosmos team to accelerate model development through agents, pipelines, and scalable evaluation.

Principal AI Engineer - Nexus Black

IFS Hybrid Itasca, United States

VIEW

Posted 27 days ago

Lead architecture and delivery of enterprise-scale LLMs, agent orchestration, and retrieval systems to build safe, scalable AI workflows for IFS Nexus Black.

AI Research Engineer

TRM Labs Hybrid San Fracisco

VIEW

Posted 28 days ago

TRM Labs is hiring a Senior AI Research Engineer to drive model evaluation, fine-tuning, and production orchestration for large-scale LLM and ML systems that power blockchain intelligence.

W

Generalist - English & Chinese (Mandarin)

Weekday AI Hybrid No location specified

VIEW

Posted 29 days ago

Work as a bilingual English–Mandarin evaluator to assess, fact-check, and annotate LLM responses for a remote contract role covering Taiwan, Malaysia, and the USA.

Project Lion - Prompt Engineer - United States (Remote, Part-Time)

Welocalize Hybrid United States

VIEW

Posted 30 days ago

Project Lion seeks a US-based Prompt Engineer to drive template-to-autorater migrations, optimize prompts using APG/APO tooling, and validate autorater quality versus human baselines.

Project Lion - Senior Prompt Engineer - United States (Remote, Part-Time)

Welocalize Hybrid United States

VIEW

Posted last month

A US-based Senior Prompt Engineer (part-time) to design, optimize, and validate prompts and autorater workflows that ensure high-quality, structured LLM outputs across complex template architectures.

Llm Evaluation Jobs

Senior Staff Machine Learning Engineer - Agentic Systems

AI Engineer

AI Engineer

AI Product Engineer, Clinical Tools

Senior AI Technical Product Manager - R01563914

Machine Learning Engineer, AI Agent Platform

Shape the Future of AI — English Talent Hub

VP, Product (AI & Search) - Slack

Forward Deployed Engineer

Generalist - English & Hindi

Staff Software Engineer, Applied AI

Applied AI & Agent Engineering Lead - Vice President

Data Scientist

Generative AI Data Analyst - USA (Remote)

AI Architect

AI Developer Experience Engineer

ML Research Scientist

Lead QA Engineer

Senior AI Data Scientist

Senior Software Engineer

Staff, Machine Learning Engineer

Sr. AI Engineer (24 months fixed-term)

AI Product Portfolio Director- Marketing

AI Product Portfolio Director- Martech

AI Agent Engineer - San Francisco Only

AI Engineer

Principal Software Developer, Applied AI

AI Product Testing Engineer

Software Engineer - Model Developer Ecosystem

ML and Agentic Systems Engineer

Principal AI Engineer - Nexus Black

AI Research Engineer

Generalist - English & Chinese (Mandarin)

Project Lion - Prompt Engineer - United States (Remote, Part-Time)

Project Lion - Senior Prompt Engineer - United States (Remote, Part-Time)

How much do llm evaluation jobs pay?

Top companies hiring for llm evaluation jobs

Best cities to find llm evaluation jobs

Below 50k* 1 25%
50k-100k* 2 50%
Over 100k* 1 25%

Llm Evaluation Jobs

How much do llm evaluation jobs pay?

Top companies hiring for llm evaluation jobs

Best cities to find llm evaluation jobs

Sign up for our weekly newsletter of fresh jobs

Sign up for our weekly
newsletter of fresh jobs