About us
FriendliAI, a Redwood City, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 400,000 open-source models. We are on a mission to deliver the world’s best platform for generative and agentic AI.
The Role
We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. You will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical and compute-intensive systems deployed by our customers.
The Person
You are an exceptional engineer with a strong foundation in GPU programming and compiler infrastructure. You enjoy pushing the performance boundaries and have experience supporting production-scale machine learning applications.
Key Responsibilities
Design and optimize custom GPU kernels for AI (e.g., transformer and diffusion) workloads
Contribute to the development of FriendliAI’s kernel compiler, memory planner, runtime, and other core components.
Collaborate with cloud and infrastructure engineers to ensure end-to-end inference performance
Analyze performance bottlenecks across the software and hardware stack, and implement targeted optimizations
Drive support for new model architectures and tensor compute patterns
Maintain production-grade performance infrastructure, including profiling, benchmarking, and validation tools
Qualifications
5+ years of experience in production or high-impact research environments
Production-level expertise in Python and C++
Bachelor’s or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Experience developing machine learning frameworks or performance-critical runtime systems
Hands-on experience writing and optimizing GPU kernels
Hands-on experience profiling GPU kernels
Experience working with generative AI models such as transformer and diffusion models
Preferred Experience
Experience developing machine learning compilers or code generation systems
Familiarity with dynamic shape compilation, memory planning, and kernel fusion
Contributions to inference engines, compilers, or high-performance numerical libraries
Understanding of multi-GPU and distributed inference strategies
Benefits
Flexible working hours
Daily lunch and dinner provided
Unlimited snacks and beverages
Supportive work environment
Health check-up support
Top-tier equipment support
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Work remotely on cloud infrastructure and data systems that power large-scale AI-driven automation for a mission-focused company transforming global waste systems.
GoodLeap is hiring a Senior Full-Stack Software Engineer/Tech Lead to drive frontend-focused, full-stack initiatives and build scalable, AI-enabled finance platform features while mentoring teammates.
U-Haul Mobile is hiring an iOS Developer Intern to work with Swift and Xcode on customer-facing and internal apps, gaining hands-on experience across the full mobile development lifecycle.
Wellmark is hiring a seasoned Platform Engineer to design, build, and scale agentic AI platforms and infrastructure that enable autonomous, enterprise-grade AI workflows.
Experienced software engineer needed to build and integrate scalable, secure payment and AI-enabled systems for Visa’s global platforms.
AVEVA is hiring a Distinguished AI Tech Lead to shape and operationalize frontier AI capabilities across industrial products, bridging advanced research and production delivery.
Experienced software engineer needed to develop and prototype NLP and LLM-based solutions that extract, structure, and automate aviation data for national airspace modernization.
Design and build AI‑enabled internal systems and integrations to scale Parloa’s Go‑To‑Market operations using TypeScript, Python, and modern AI tooling.
Lead a global engineering organization to integrate AI-powered tooling, drive execution excellence, and shape product delivery strategies as the Senior Director of Engineering (remote).
Constructor seeks a Senior Backend Engineer to design and operate low-latency, high-throughput Attribute Enrichment and Badges services that deliver ML-generated item attributes to global e-commerce customers.
SeatGeek is looking for Software Engineers to design, build, and operate scalable services and user experiences for a high-traffic ticketing marketplace in a fully remote work environment.
CSCI Consulting is seeking an experienced MuleSoft Integration Developer to design and implement secure, high-performance integrations and API-led connectivity for a major Federal modernization program.
An established tech organization seeks a Senior Director of Engineering to lead AI-powered engineering practices, operational excellence, and global delivery for product-driven teams.