Job details

Software Engineer – AI Inference Engine

About us

FriendliAI, a Redwood City, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 400,000 open-source models. We are on a mission to deliver the world’s best platform for generative and agentic AI.

The Role

We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. You will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical and compute-intensive systems deployed by our customers.

The Person

You are an exceptional engineer with a strong foundation in GPU programming and compiler infrastructure. You enjoy pushing the performance boundaries and have experience supporting production-scale machine learning applications.

Key Responsibilities

Design and optimize custom GPU kernels for AI (e.g., transformer and diffusion) workloads
Contribute to the development of FriendliAI’s kernel compiler, memory planner, runtime, and other core components.
Collaborate with cloud and infrastructure engineers to ensure end-to-end inference performance
Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations
Drive support for new model architectures and tensor compute patterns
Maintain production-grade performance infrastructure, including profiling, benchmarking, and validation tools

Qualifications

5+ years of experience in production or high-impact research environments
Production-level expertise in Python and C++
Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Experience developing machine learning frameworks or performance-critical runtime systems
Hands-on experience writing and optimizing GPU kernels
Hands-on experience profiling GPU kernels
Experience working with generative AI models such as transformer and diffusion models

Preferred Experience

Experience developing machine learning compilers or code generation systems
Familiarity with dynamic shape compilation, memory planning, and kernel fusion
Contributions to inference engines, compilers, or high-performance numerical libraries
Understanding of multi-GPU and distributed inference strategies

Benefits

Flexible working hours
Daily lunch and dinner provided
Unlimited snacks and beverages
Supportive work environment
Health check-up support
Top-tier equipment support

GPU CUDA ROCm C++ Python Inference Engine Kernel Compiler Transformer Diffusion Memory Planning Kernel Fusion Profiling Performance Engineering Distributed Inference PyTorch

Average salary estimate

$240000 / YEARLY (est.)

min

max

$180000K

$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Software Engineer - Cloud

Jobgether Hybrid US

VIEW

Posted 7 hours ago

Work remotely on cloud infrastructure and data systems that power large-scale AI-driven automation for a mission-focused company transforming global waste systems.

Senior Software Engineer/Tech Lead (Full-Stack)

GoodLeap Hybrid Remote, US

VIEW

Posted 7 hours ago

Dental Insurance

Disability Insurance

Flexible Spending Account (FSA)

Health Savings Account (HSA)

Vision Insurance

Performance Bonus

Family Medical Leave

Paid Holidays

GoodLeap is hiring a Senior Full-Stack Software Engineer/Tech Lead to drive frontend-focused, full-stack initiatives and build scalable, AI-enabled finance platform features while mentoring teammates.

iOS Developer Intern

uhaul Hybrid Phoenix, Arizona

VIEW

Posted 2 hours ago

U-Haul Mobile is hiring an iOS Developer Intern to work with Swift and Xcode on customer-facing and internal apps, gaining hands-on experience across the full mobile development lifecycle.

Platform Engineer - Tech Incubation

Wellmark, Inc. Hybrid Des Moines, IA, USA

VIEW

Posted 7 hours ago

Wellmark is hiring a seasoned Platform Engineer to design, build, and scale agentic AI platforms and infrastructure that enable autonomous, enterprise-grade AI workflows.

Sr. Consultant SW Engineer

Visa Hybrid Bellevue, WA

VIEW

Posted 7 hours ago

Experienced software engineer needed to build and integrate scalable, secure payment and AI-enabled systems for Visa’s global platforms.

Distinguished AI Tech Lead, Frontier Applications

AVEVA Hybrid San Leandro, California, United States of America

VIEW

Posted 13 hours ago

AVEVA is hiring a Distinguished AI Tech Lead to shape and operationalize frontier AI capabilities across industrial products, bridging advanced research and production delivery.

Software Engineer/Developer

Jobgether Hybrid US

VIEW

Posted 6 hours ago

Experienced software engineer needed to develop and prototype NLP and LLM-based solutions that extract, structure, and automate aviation data for national airspace modernization.

Software Engineer, GTM

Parloa Hybrid Remotely in the USA

VIEW

Posted 15 hours ago

Design and build AI‑enabled internal systems and integrations to scale Parloa’s Go‑To‑Market operations using TypeScript, Python, and modern AI tooling.

Lead Director of Engineering - REMOTE

Jobgether Hybrid North Carolina

VIEW

Posted 22 hours ago

Lead a global engineering organization to integrate AI-powered tooling, drive execution excellence, and shape product delivery strategies as the Senior Director of Engineering (remote).

Senior Backend Engineer: Attribute Enrichment (Remote)

Constructor Hybrid No location specified

VIEW

Posted 14 hours ago

Constructor seeks a Senior Backend Engineer to design and operate low-latency, high-throughput Attribute Enrichment and Badges services that deliver ML-generated item attributes to global e-commerce customers.