Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Member of Technical Staff - Inference Optimization  image - Rise Careers
Job details

Member of Technical Staff - Inference Optimization

About ai&

ai& is a new global AI technology company dedicated to meeting the world's growing demand for AI. Our vision is twofold: to serve as a premier AI lab specializing in localization, and to act as a global infrastructure and compute provider. We are building a unified, optimized global platform that integrates next-generation data centers and infrastructure, heterogeneous compute serving, and advanced model services. We believe that the most effective way to build and scale AI is to own the stack from top to bottom.

At ai&, we empower small teams with the autonomy needed to tackle significant challenges. Our approach is to deconstruct large problems into manageable components and solve complex issues collaboratively. We seek highly motivated, mission-driven individuals who demonstrate strong personal agency. We value curiosity as the foundation of talent, and we are looking for people eager to develop alongside our evolving technology and expanding business.

We are actively hiring worldwide, with presence in Tokyo, SF, Austin, and Toronto. We are more than happy to meet exceptional talent where they are.

Role overview

As a Kernel Optimization Engineer, your objective is to extract everything from heterogeneous GPU hardware. This means going below the framework layer, writing, profiling, and tuning custom CUDA and ROCm/HIP kernels that sit at the heart of our inference and training stack. You will work across NVIDIA and AMD silicon, understanding the deep architectural differences between the two and writing code that is optimal for each.

This is not a role about deploying existing kernels. It is about authoring them. You will identify bottlenecks in the execution loop including memory bandwidth saturation, warp divergence, occupancy limits, and cache thrashing, and build solutions from first principles. You will work closely with our inference and serving team to ensure that the kernels you build translate into real-world performance gains — but your domain is the kernel layer and everything below it.

The scope spans attention mechanisms, quantization primitives, custom activation functions, fused operators, and the communication kernels that tie multi-GPU systems together. The ideal candidate has a hardware-first intuition: they think in warps, tiles, and memory hierarchies before they think in frameworks. They are equally comfortable reading PTX and roofline charts. And they are never done optimizing.

Responsibilities

  • Custom Kernel Development Design and implement high-performance kernels for core AI primitives including GEMM, attention, normalization, and convolution. Own the full cycle from profiling to production deployment across LLM inference, training, and generative model workloads.

  • Cross-Vendor Hardware Optimization Develop deep expertise across NVIDIA and AMD GPU architectures. Understand the micro-architectural differences including memory subsystems, scheduler behavior, and cache hierarchies, and write kernels that are genuinely optimal for each target. Optimize across heterogeneous compute units including SIMD, matrix engines, and DMA.

  • Attention & Linear Algebra Primitives Build and tune fused attention kernels (Flash Attention variants, MLA, paged attention), GEMM primitives, and quantized compute paths (INT8, FP8, AWQ, GPTQ) that push the hardware to its limits.

  • Precision & Numerical Stability Prototype and evaluate precision formats including FP16, BF16, FP8, e5m2, and stochastic rounding. Understand the accuracy and performance trade-offs at a deep level and make principled decisions about where each format belongs.

  • Profiling & Bottleneck Analysis Use Nsight Compute, rocprof, Perfetto, VTune, and custom instrumentation to identify and eliminate performance bottlenecks. Translate profiling data into concrete architectural improvements.

  • Operator Fusion Identify opportunities to fuse multi-step operations into single kernel launches, reducing memory round-trips and kernel launch overhead across the inference and training execution graphs.

  • Communication Kernel Optimization Optimize collective communication primitives (AllReduce, AllGather, ReduceScatter) for multi-GPU and multi-node topologies, working closely with the infrastructure team.

  • Compiler & Runtime Integration Collaborate with compiler and runtime teams to integrate kernels into Triton, PyTorch, or SYCL pipelines. Contribute to micro-architecture feedback loops, helping co-design ISA and memory features with the hardware team where relevant.

  • Cross-Team Collaboration Work closely with the inference and serving team to ensure kernel-level performance translates into system-level gains. Share profiling insights, align on optimization priorities, and contribute to architectural decisions across teams.

  • Technical Leadership Maintain a high level of personal agency. Write production code, review kernel implementations, and contribute to architectural decisions in a flat, fast-moving team environment.

You may be a fit if you have the following skills

  • Deep Kernel Authorship You have written production CUDA or ROCm kernels from scratch. You understand warp execution, shared memory bank conflicts, occupancy, and instruction-level parallelism at an intuitive level. Strong proficiency in C++11 or higher, CUDA, Triton, and ideally LLVM/MLIR.

  • Hardware Architecture Knowledge Strong familiarity with NVIDIA Hopper/Ampere and AMD CDNA architectures. You know the differences between HBM bandwidth profiles, cache sizes, and execution units and you write code that reflects that knowledge. Deep understanding of memory layout, vectorization, thread and block scheduling, and cache behavior.

  • Precision & Numerical Fluency Solid grasp of numerical stability, mixed precision arithmetic, and modern precision formats. Experience making principled trade-offs between precision and performance in production systems.

  • Profiling Fluency Comfortable with Nsight Compute, rocprof, Perfetto, VTune, and roofline modeling. You do not guess where the bottleneck is. You measure it.

  • Parallel Programming Breadth Strong background across parallel programming models including CUDA, Triton, SYCL, OpenCL, or OpenMP. Experience optimizing irregular algorithms such as sparse linear algebra or graph computations.

  • Systems Thinking Ability to reason about how individual kernels compose into larger execution graphs, and how kernel-level decisions propagate up through the inference or training stack.

  • Great Team Spirit A mission-driven approach to engineering, valuing clear communication, hands-on execution, and collective success over individual silos

Awesome Motive Glassdoor Company Review
4.2 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Awesome Motive DE&I Review
4.4 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Awesome Motive
Awesome Motive CEO photo
Kartik Mandaville
Approve of CEO

Average salary estimate

$240000 / YEARLY (est.)
min
max
$180000K
$300000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Awesome Motive logo

What it's like to work at Awesome Motive

Read Reviews
Similar Jobs
Photo of the Rise User
Awesome Motive Hybrid No location specified
Posted 13 hours ago

Lead brand, creative, and performance for a fast-growing high-ticket coaching firm as Limitless Agency's Marketing Manager, owning paid & organic growth plus the CEO's personal brand.

Photo of the Rise User
Posted 10 hours ago

Hobbes is looking for a founding Account Executive to own full-cycle sales and build the go-to-market motion for its autonomous demo agent from its San Francisco HQ.

Photo of the Rise User
Posted 18 hours ago

Chainguard is seeking a Staff Software Engineer to lead architecture and implementation of a scalable, secure Libraries Platform that automates builds, verification, and distribution of open-source packages (remote, full-time).

Posted 1 hour ago

Tenex is hiring a Software Engineer II to develop scalable full-stack systems for its AI-native MDR platform and help shape product and engineering practices in a fast-growing startup.

Photo of the Rise User

Senior technical leader sought to shape LinkedIn’s core infrastructure strategy and lead cross-team initiatives across networking, storage, and messaging at massive scale.

Photo of the Rise User
FINRA Hybrid Rockville, MD (Job Posting)
Posted 17 hours ago

FINRA is hiring a Software Engineer in Rockville, MD to develop robust, maintainable software and support engineering and operational excellence across the SDLC in a hybrid environment.

Photo of the Rise User
Inclusive & Diverse
Empathetic
Take Risks
Transparent & Candid
Feedback Forward
Mission Driven
Collaboration over Competition
Work/Life Harmony
Maternity Leave
Paternity Leave
Snacks
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
401K Matching
Paid Sick Days
Paid Time-Off
Paid Volunteer Time

Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.

Photo of the Rise User
Posted 20 hours ago

PracticeQ is hiring a Lead Software Engineer to drive design and delivery of scalable .NET services and modern front-end features that improve practice management and patient experiences.

Photo of the Rise User

Visa is hiring a Staff Software Engineer to architect and run mission-critical, GCP-based payment services in a hybrid Foster City role.

Photo of the Rise User

Lead performance and scalability improvements for LLM inference by optimizing runtime components, multi-GPU execution, and open-source serving frameworks at scale.

Adaptive is hiring a Lead Application Security Engineer to own and harden application security across their Java/TypeScript services and AWS infrastructure as the company scales.

Photo of the Rise User
Posted 19 hours ago

Lead backend development for Bumble's Dating product by building scalable GCP-native services, driving projects end-to-end, and mentoring junior engineers.

Photo of the Rise User

WHOOP is hiring a Senior Fullstack Software Engineer to develop scalable AI platform features and seamless member experiences from frontend interfaces to backend APIs.

Photo of the Rise User

Lead and mentor a software engineering team to design and deliver manufacturing software and tooling that enables production of next‑generation surgical robotics.

Photo of the Rise User
Posted 2 hours ago

Benepass is hiring a Senior Design Engineer to design, build, and evolve a scalable React/TypeScript design system and component library that bridges design and engineering.

SpringRole is the first professional reputation network powered by artificial intelligence and blockchain to eliminate fraud from user profiles. Because SpringRole is built on blockchain and uses smart contracts, it's able to verify work experienc...

735 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, remote
DATE POSTED
March 21, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!