Browse 145 exciting jobs hiring in Evaluation now. Check out companies hiring such as Spotify, National Vision, Cover Whale in Washington, Fremont, Long Beach.
Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.
Lead the People Development team at National Vision to design and deliver scalable, measurable learning solutions for corporate, retail, manufacturing, and clinical associates.
Lead and build the agentic AI platform that enables pods of engineers and AI agents to safely and reliably deliver production software at scale.
LanguageWire is hiring an AI Engineer to design and productionize LLM-based translation workflows and bridge ML experimentation with production engineering.
Evaluate luxury brand experiences for CXG through flexible in-store or online missions that provide actionable feedback to premium brands.
Work on a mission-driven fintech team to build and ship core AI products (LLM/VLM and evaluation pipelines) that power eligibility and compliance for education savings accounts.
Iambic Therapeutics seeks a Software Engineer II to co-develop and harden ML training, evaluation, and productization workflows that enable AI-driven drug discovery.
Lead and grow an Applied AI engineering team at Mercor to build scalable evaluation and data systems that measurably improve frontier model performance.
Application Engineering Intern at Renesas Hi-Rel to perform lab-based evaluations of power/ADC products, produce technical analysis, and present findings.
Evaluate machine-translated English (US) to Japanese (Japan) song lyrics for meaning, fluency, and cultural accuracy on a flexible, remote freelance project with Welo Data.
Anduril seeks an experienced manager to lead flight test integration and operations for UAS platforms, overseeing system integration, mesh networking, and Flight Test Operations as an RPIC.
Senior NDE Engineer (Radiography Testing) to design, prototype, and deploy advanced radiography and automated inspection solutions to improve manufacturing quality and flight reliability at SpaceX.
Lead the product vision and engineering for clinician-facing AI tools at knownwell, building and operating RAG-based clinical decision support with full product ownership and direct clinician partnership.
Experienced technical product leader needed to own prioritization, quality, and stakeholder alignment for LLM-driven products while staying hands-on with architecture, code reviews, and AI cost optimization.
Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.
Contract freelance raters in the United States will evaluate personalized map and search recommendations using their Google Maps activity history and follow project guidelines to rate relevance and usefulness.
Welo Data is building a flexible, remote contributor network of native English speakers to annotate, evaluate, and create prompts that improve AI systems.
Carilion Clinic is hiring a part-time Community Outreach Specialist to deliver evidence-based pediatric health education and support community partnerships across the Roanoke area.
Evaluate machine-translated English (US) to German (Germany) song lyrics for accuracy, fluency, and cultural appropriateness in a remote freelance role.
Lead Slack's search and AI platform as VP Product to set strategy, drive model and infrastructure decisions, and deliver reliable, scalable AI-powered search and knowledge services for enterprise users.
Lead AbbVie's Neurosciences Search & Evaluation team to identify, assess, and advance high-value external partnering opportunities that strengthen the company’s neuroscience pipeline and strategic goals.
NiCE is hiring a Forward Deployed Engineer to design, ship, and operate production-scale conversational AI agents that solve high-impact enterprise problems.
Montefiore is hiring a licensed Psychologist (PhD/PsyD) to conduct disability-related psychological assessments and clinical consultations for participants in the WeCARE employment-focused program.
Experienced domain experts in Business Operations & Communications or Education and Academic Research are needed for a remote, retainer-based 2‑week role evaluating and crafting prompts for AI writing models with US-contextual standards.
Join an early-stage AI safety startup as a founding Forward Deployed Engineer to design rigorous AI evals, lead customer implementations, and shape product strategy for certification of real-world AI agents.
Work as a freelance luxury brand evaluator for CXG, discreetly assessing boutique and online experiences to help premium brands refine their service.
Serve as the MHPSS Technical Advisor for IRC RAI, providing evidence-based guidance, training, and partnership support to improve mental health and psychosocial services for forcibly displaced populations in the U.S.
Lead and develop a remote evaluation team in WGU’s School of Technology to ensure accurate, scalable competency-based assessment and continuous improvement for Electrical and Computer Engineering programs.
Epoch AI is hiring remote Researchers and Senior Researchers to conduct data-driven investigations, build benchmarks, and forecast AI capabilities and trends.
Visa is hiring a Product Analyst to define and scale generative AI platform capabilities, combining product analytics, prototyping, and cross-functional collaboration to deliver responsible, enterprise-grade AI solutions.
Lead cross-functional hardware programs for Anduril's Warfighter Systems, delivering embedded edge compute solutions and managing technical, schedule, and compliance risks across product lifecycles.
Colibri Group is hiring an AI Engineering Intern to help design and evaluate AI-driven educational tools, focusing on model behavior, alignment, and responsible AI practices under senior mentorship.
Experienced analytics professional needed to perform human capital program evaluations and deliver data-driven reporting and dashboards in support of federal HR modernization efforts.
BryceTech seeks an experienced Data Analyst to support DHS intelligence performance programs by turning complex data into actionable insights, reports, and training materials.
Lead the design and delivery of advanced multi-function RF hardware at STR, driving radar and EW/RF convergence for defense applications.
KBR is seeking an experienced DoD Technical Writer with an active Secret clearance to create and maintain documentation for Big Data systems and DoD test-range programs.
Medtronic is hiring a Post Market Quality Engineer II to analyze post-market data, conduct health risk assessments, and support CAPA and regulatory compliance efforts to improve device safety and performance.
Unstructured is hiring an AI Engineer to architect and ship production-grade RAG and agentic systems that process messy multimodal data for high-impact government and military contracts.
Lead systems design, integration, and field test of advanced AUV platforms for Anduril’s Maritime team, translating mission needs into robust, testable vehicle capabilities.
Serve as AEU’s primary community liaison to coordinate outreach, track requests, and integrate community feedback into NYC DOT’s automated enforcement planning and communications.
A technical, hands-on Senior Application Analyst role to manage and automate third-party business applications integrated with Salesforce at a fast-growing health-tech company.
Lead Entomology outreach across Ohio State's Wooster and Columbus sites by managing outreach teams, developing STEAM education programs, and ensuring animal-care and compliance standards.
Help scale production ML infrastructure and retrieval systems at Foxglove to enable high-performance semantic search and data mining over multimodal robotics data.
Serve as a Workforce Research Analyst supporting human capital analytics and workforce planning for federal clients, leveraging data analysis, program evaluation, and executive-level briefing materials to inform HR modernization.
ACS' Family Services Division is hiring a Program Strategy & Data Undergraduate Intern to support data management, develop reports and process maps, and assist with event coordination during the summer internship program.
Evaluate luxury brand experiences across Massachusetts on a flexible, paid mission basis for CXG, providing objective feedback to top-tier brands.
TRI's Future Factory team is hiring a Senior Research Engineer to design scalable training/evaluation infrastructure and high-performance geometry and physics-aware tooling that translate research into production-grade systems.
Contract opportunity to evaluate and improve LLM conversational responses in Hindi and English by performing fact-checking, annotation, and qualitative assessment.
Lead the design and production of LLM-driven coaching systems at Valence, applying deep ML and engineering expertise to build enterprise-grade, context-aware AI experiences.
Support USG-funded global health projects as a Program Officer II by coordinating project components, managing budgets and subawards, and ensuring programmatic compliance across multiple countries.
Below 50k*
4
|
50k-100k*
7
|
Over 100k*
21
|