Rise Jobs & Careers icon Llm Evaluation Jobs

Browse 35 exciting jobs hiring in Llm Evaluation now. Check out companies hiring such as Spotify, LanguageWire, EQL Tech in Providence, Houston, Chattanooga.

Photo of the Rise User
Inclusive & Diverse
Empathetic
Take Risks
Transparent & Candid
Feedback Forward
Mission Driven
Collaboration over Competition
Work/Life Harmony
Maternity Leave
Paternity Leave
Snacks
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
401K Matching
Paid Sick Days
Paid Time-Off
Paid Volunteer Time

Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.

LanguageWire Hybrid No location specified
Posted yesterday

LanguageWire is hiring an AI Engineer to design and productionize LLM-based translation workflows and bridge ML experimentation with production engineering.

EQL Tech Hybrid No location specified
Posted 2 days ago

Work on a mission-driven fintech team to build and ship core AI products (LLM/VLM and evaluation pipelines) that power eligibility and compliance for education savings accounts.

Lead the product vision and engineering for clinician-facing AI tools at knownwell, building and operating RAG-based clinical decision support with full product ownership and direct clinician partnership.

Photo of the Rise User
Brillio Hybrid New York, New York, United States
Posted 4 days ago

Experienced technical product leader needed to own prioritization, quality, and stakeholder alignment for LLM-driven products while staying hands-on with architecture, code reviews, and AI cost optimization.

Photo of the Rise User

Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.

Posted 6 days ago

Welo Data is building a flexible, remote contributor network of native English speakers to annotate, evaluate, and create prompts that improve AI systems.

Photo of the Rise User
Salesforce Hybrid California - San Francisco
Posted 6 days ago
Inclusive & Diverse
Rise from Within
Mission Driven
Diversity of Opinions
Work/Life Harmony
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Paid Time-Off
Maternity Leave
Paternity Leave
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Employee Resource Groups

Lead Slack's search and AI platform as VP Product to set strategy, drive model and infrastructure decisions, and deliver reliable, scalable AI-powered search and knowledge services for enterprise users.

Photo of the Rise User
Posted 6 days ago

NiCE is hiring a Forward Deployed Engineer to design, ship, and operate production-scale conversational AI agents that solve high-impact enterprise problems.

Weekday AI Hybrid No location specified
Posted 11 days ago

Contract opportunity to evaluate and improve LLM conversational responses in Hindi and English by performing fact-checking, annotation, and qualitative assessment.

Photo of the Rise User
Posted 12 days ago

Lead the design and production of LLM-driven coaching systems at Valence, applying deep ML and engineering expertise to build enterprise-grade, context-aware AI experiences.

Posted 16 days ago

Senior engineering leader to design, evaluate and productionize agentic AI systems, prompt architectures and multi-agent orchestration for critical banking workflows at Deutsche Bank in Cary, NC.

Photo of the Rise User
Crosby Hybrid New York City
Posted 16 days ago

Crosby AI is hiring a Data Scientist to develop NLP/LLM models, evaluation frameworks, and data strategies that power its AI-driven legal platform.

Generative AI Analyst at Welocalize to craft prompts, annotate and evaluate LLM outputs, and lead labeling workflows in a remote full-time role.

Photo of the Rise User
Posted 17 days ago

Lead the design and implementation of secure, scalable Generative AI and ML architectures for an EdTech organization focused on building production-ready RAG, retrieval, and MLOps solutions.

Photo of the Rise User
Posted 17 days ago

Build the internal tooling and evaluation infrastructure that empowers engineers and researchers to iterate quickly and reliably on Crosby’s LLM-powered legal platform.

Photo of the Rise User
Posted 18 days ago
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Sabbatical
Paid Holidays

Handshake is hiring an ML Research Scientist to drive open scientific research, create public benchmarks, and collaborate with top AI labs to advance data and evaluation methods for frontier models.

Photo of the Rise User
Posted 18 days ago

Finny, an AI-first fintech in Chelsea, NYC, is hiring a Lead QA Engineer to build and scale automated testing, CI/CD integration, and LLM evaluation across our Python backend and TypeScript frontend.

MLabs Hybrid No location specified
Posted 19 days ago

Lead the design and evaluation of agentic LLM systems that power a fintech's financial intelligence platform, ensuring correctness, scalability, and production reliability.

Photo of the Rise User
Posted 20 days ago

Experienced software engineers with strong system-design and ML/LLM experience are needed to build and productionize LLM-powered agents, evaluation pipelines, and scalable AI infrastructure at Permute.

Photo of the Rise User
Posted 20 days ago

Fullscript is looking for a Staff Machine Learning Engineer to architect and ship production LLM-driven clinical features that improve clinician workflows and patient outcomes.

Photo of the Rise User
Inclusive & Diverse
Diversity of Opinions
Growth & Learning
Mission Driven
Social Impact Driven
Empathetic
Dental Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Performance Bonus
Family Medical Leave
Paid Holidays

Khan Academy is hiring a Senior AI Engineer (24-month fixed-term) to lead integration, evaluation, and quality improvements of generative AI features that support learning at scale.

Photo of the Rise User
ServiceNow Hybrid 15725 Dallas Pkwy, Addison, TX 75001, USA
Posted 22 days ago
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity

Lead the AI product portfolio for marketing to turn enterprise AI strategy into a cohesive MarTech roadmap, measurable productivity gains, and durable automation at scale.

Photo of the Rise User
ServiceNow Hybrid 275 Wyman St 2nd floor, Waltham, MA 02451, USA
Posted 22 days ago
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity

Lead the AI MarTech product portfolio at ServiceNow to convert AI strategy into scalable agentic workflows, measurable productivity gains, and sustained marketing leverage.

Photo of the Rise User

Work on TRM’s AI Engineering team to design and ship agentic LLM systems and scalable infrastructure that augment investigations and ensure safe, auditable behavior in high-sensitivity environments.

Varick Agents Hybrid No location specified
Posted 23 days ago

Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.

Photo of the Rise User

Lead the design, production deployment, and continual improvement of AI-powered features for Savvas's flagship K-12 platform, applying deep LLM, cloud, and software engineering expertise to improve student learning at scale.

Posted 26 days ago

Virtue AI is seeking a hands-on Testing Engineer to lead product and backend QA, automate system testing, and perform model red-teaming for a cutting-edge AI security platform.

Photo of the Rise User

Help shape Baseten's model ecosystem by combining hands-on engineering, developer education, and product thinking to improve model discovery, evaluation, and adoption.

Photo of the Rise User
Posted 27 days ago
Customer-Centric
Mission Driven
Inclusive & Diverse
Rise from Within
Diversity of Opinions
Work/Life Harmony
Growth & Learning
Transparent & Candid
Medical Insurance
Paid Time-Off
Maternity Leave
Mental Health Resources
Equity
Child Care stipend
Paternity Leave
WFH Reimbursements
Flex-Friendly
Dental Insurance
Vision Insurance
Life insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
401K Matching
Military leave

Lead the engineering of agentic ML systems and AI-native developer tooling at NVIDIA's Cosmos team to accelerate model development through agents, pipelines, and scalable evaluation.

Photo of the Rise User
IFS Hybrid Itasca, United States
Posted 27 days ago

Lead architecture and delivery of enterprise-scale LLMs, agent orchestration, and retrieval systems to build safe, scalable AI workflows for IFS Nexus Black.

Photo of the Rise User
Posted 28 days ago

TRM Labs is hiring a Senior AI Research Engineer to drive model evaluation, fine-tuning, and production orchestration for large-scale LLM and ML systems that power blockchain intelligence.

Posted 29 days ago

Work as a bilingual English–Mandarin evaluator to assess, fact-check, and annotate LLM responses for a remote contract role covering Taiwan, Malaysia, and the USA.

Photo of the Rise User

Project Lion seeks a US-based Prompt Engineer to drive template-to-autorater migrations, optimize prompts using APG/APO tooling, and validate autorater quality versus human baselines.

Photo of the Rise User

A US-based Senior Prompt Engineer (part-time) to design, optimize, and validate prompts and autorater workflows that ensure high-quality, structured LLM outputs across complex template architectures.

Employment type
Remote/Onsite
Application Type
Date Posted
Department
Work Experience
Industries
Skills
Company size
Funding
Company Culture
Benefits & Perks
Company Rating
Salary (USD)
Keywords to Exclude

How much do llm evaluation jobs pay?

Below 50k*
1
25%
50k-100k*
2
50%
Over 100k*
1
25%
*average yearly salary (USD)

Top companies hiring for llm evaluation jobs

Best cities to find llm evaluation jobs