Browse 24 exciting jobs hiring in Model Evaluation now. Check out companies hiring such as thomsonreuters, sonyglobal, Iambic Therapeutics, Inc in San Antonio, Arlington, Anaheim.
Lead the design and delivery of scalable, secure AI-native systems for sophisticated legal customers as a Staff Software Engineer / Architect on Thomson Reuters' CoCounsel FDE team.
Sony AI’s Research Ethics team is hiring a remote Research Intern to work on generative AI ethics, evaluation, and harm-mitigation research with opportunities for publication.
Iambic Therapeutics seeks a Software Engineer II to co-develop and harden ML training, evaluation, and productization workflows that enable AI-driven drug discovery.
Lead and grow an Applied AI engineering team at Mercor to build scalable evaluation and data systems that measurably improve frontier model performance.
Experienced technical product leader needed to own prioritization, quality, and stakeholder alignment for LLM-driven products while staying hands-on with architecture, code reviews, and AI cost optimization.
Lead Slack's search and AI platform as VP Product to set strategy, drive model and infrastructure decisions, and deliver reliable, scalable AI-powered search and knowledge services for enterprise users.
Join an early-stage AI safety startup as a founding Forward Deployed Engineer to design rigorous AI evals, lead customer implementations, and shape product strategy for certification of real-world AI agents.
Visa is hiring a Product Analyst to define and scale generative AI platform capabilities, combining product analytics, prototyping, and cross-functional collaboration to deliver responsible, enterprise-grade AI solutions.
Colibri Group is hiring an AI Engineering Intern to help design and evaluate AI-driven educational tools, focusing on model behavior, alignment, and responsible AI practices under senior mentorship.
Unstructured is hiring an AI Engineer to architect and ship production-grade RAG and agentic systems that process messy multimodal data for high-impact government and military contracts.
Help scale production ML infrastructure and retrieval systems at Foxglove to enable high-performance semantic search and data mining over multimodal robotics data.
Contract opportunity to evaluate and improve LLM conversational responses in Hindi and English by performing fact-checking, annotation, and qualitative assessment.
Lead the design and production of LLM-driven coaching systems at Valence, applying deep ML and engineering expertise to build enterprise-grade, context-aware AI experiences.
Crosby AI is hiring a Data Scientist to develop NLP/LLM models, evaluation frameworks, and data strategies that power its AI-driven legal platform.
Lead the design and implementation of secure, scalable Generative AI and ML architectures for an EdTech organization focused on building production-ready RAG, retrieval, and MLOps solutions.
Build the internal tooling and evaluation infrastructure that empowers engineers and researchers to iterate quickly and reliably on Crosby’s LLM-powered legal platform.
Experienced software engineers with strong system-design and ML/LLM experience are needed to build and productionize LLM-powered agents, evaluation pipelines, and scalable AI infrastructure at Permute.
Handshake seeks experienced 3D Slicer users to remotely evaluate AI-generated medical imaging content and provide expert feedback on segmentation, DICOM workflows, and clinical research relevance.
Work on TRM’s AI Engineering team to design and ship agentic LLM systems and scalable infrastructure that augment investigations and ensure safe, auditable behavior in high-sensitivity environments.
Rwazi is hiring a Decision Intelligence Analyst to validate and improve AI-driven decision outputs by identifying failure modes, formalizing evaluation rubrics, and refining judgment frameworks.
Lead architecture and delivery of scalable, secure AI and agentic systems at PointClickCare to drive measurable clinical and operational outcomes across the platform.
Virtue AI is seeking a hands-on Testing Engineer to lead product and backend QA, automate system testing, and perform model red-teaming for a cutting-edge AI security platform.
Help shape Baseten's model ecosystem by combining hands-on engineering, developer education, and product thinking to improve model discovery, evaluation, and adoption.
TRM Labs is hiring a Senior AI Research Engineer to drive model evaluation, fine-tuning, and production orchestration for large-scale LLM and ML systems that power blockchain intelligence.
Below 50k*
1
|
50k-100k*
2
|
Over 100k*
18
|