Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Site Reliability Engineer image - Rise Careers
Job details

Site Reliability Engineer - job 1 of 2

About Hammerhead

We're unleashing AI with intelligent orchestration while addressing one of the most pressing bottlenecks for AI: access to power. Our cutting-edge platform optimizes data center power infrastructure to maximize AI workload throughput within existing electrical limits, without requiring new power plants or grid expansions.

Our platform uses reinforcement learning to intelligently orchestrate power, cooling, and compute in real time, enabling data centers to run significantly more AI workloads within their existing electrical and thermal limits. Our team, at AutoGrid, had optimized over 8 gigawatts of mission-critical power globally. At Hammerhead, we're addressing a $64 billion-per-year market opportunity while dramatically reducing the environmental footprint of AI infrastructure.

At Hammerhead, you will:

⚡ Work at the intersection of AI, energy, and compute to help build the next generation of AI infrastructure

🤝 Collaborate with colleagues that are experts in modern RL and AI, IoT and IIoT software, and infrastructure technologies

🌎 Contribute to building a more efficient and sustainable future for AI compute

🚀 Join a company at the cutting edge of modern data center design and operation

💰 Receive competitive compensation, equity, and benefits in a high-growth, mission-driven environment

🚀 Learn from an experienced team that has built and sold startups before

About the Role

We are seeking a Site Reliability Engineer to own the reliability, scalability, and operational excellence of Hammerhead's AI-driven power orchestration platform. Our software runs in production data centers around the world, where real-time decisions directly affect gigawatts of compute infrastructure. Availability, latency, and correctness are not negotiable.

You will work at the boundary between software and infrastructure, building the systems that deploy, monitor, and protect Hammerhead's platform in production. You will partner with engineering teams to establish SLOs, automate toil, accelerate releases, and ensure that when things go wrong, we know fast and recover faster.

This is a foundational SRE role. You will be the first dedicated hire in this function. You will set the standard for how Hammerhead runs software in production.

You will report to the Head of Engineering.

Key Responsibilities

  • Own production reliability for Hammerhead's platform: define and enforce SLOs, SLAs, and error budgets across services, and drive resolution when they are breached.

  • Build and maintain the observability stack: metrics, logging, distributed tracing, and alerting across cloud and on-prem deployment environments.

  • Architect and manage CI/CD pipelines that enable fast, safe, and repeatable deployments to production data center environments.

  • Automate operational toil: provisioning, configuration management, scaling, failover, and incident response workflows.

  • Lead incident response: act as the primary on-call escalation, run blameless post-mortems, and drive systemic fixes that prevent recurrence.

  • Partner with software and RL engineers to bake reliability into the development lifecycle: code reviews, deployment checklists, chaos testing, and load testing.

  • Manage and evolve Hammerhead's cloud infrastructure (primarily AWS) and edge deployment infrastructure at customer data center sites.

  • Establish security and compliance practices for production environments: secrets management, access controls, audit logging, and vulnerability remediation.

  • Evaluate and introduce tooling that improves platform velocity and reliability, from container orchestration to infrastructure-as-code to incident management platforms.

Qualifications

Required

  • 4+ years of experience in site reliability engineering, DevOps, or platform/infrastructure engineering in production environments.

  • Deep proficiency with Kubernetes and container orchestration in production, including cluster management, resource limits, autoscaling, and network policies.

  • Strong infrastructure-as-code experience with Terraform, Pulumi, or equivalent. You manage cloud resources in code, not consoles.

  • Hands-on experience with observability tools (Prometheus, Grafana, Datadog, OpenTelemetry, or equivalent) and building alerting that is actionable, not noisy.

  • Experience using Claude Code (or similar) to develop and maintain secure, compliant, and automated infrastructure-as-code (IaC) workflows.

  • Expertise in Python for writing automation scripts, internal tooling, and operational runbooks.

  • Experience managing CI/CD pipelines (GitHub Actions, ArgoCD, or equivalent) and deployment strategies (blue/green, canary, rollback).

  • Strong incident response instincts: you stay calm under pressure, communicate clearly during outages, and follow through on fixes after the fact.

  • Comfortable working in environments with strict operational requirements: uptime SLAs, customer-facing commitments, and regulated or critical infrastructure.

Preferred

  • Experience deploying or operating software in industrial, energy, or data center environments, especially hybrid cloud/on-prem topologies.

  • Familiarity with ML/AI system operations: managing model serving infrastructure, GPU workload scheduling, or real-time inference pipelines.

  • Experience with advanced Kubernetes networking and security primitives.

  • Background in chaos engineering, game day exercises, or formal reliability testing frameworks.

  • Prior experience as the first or founding SRE at an early-stage company.

What We Offer

💰 Competitive base salary + meaningful equity in a high-growth, well-funded company

🏥 Comprehensive health, dental, and vision insurance

🧠 The opportunity to build Hammerhead's reliability function from the ground up and own it

🚀 A collaborative, low-ego team of world-class engineers and researchers solving genuinely hard problems

🌎 Work that matters: our platform reduces the energy footprint of AI at scale

Hammerhead is an equal opportunity employer. We are committed to creating an inclusive environment for all employees and encourage applications from candidates of all backgrounds, experiences, and perspectives. We provide reasonable accommodations for individuals with disabilities throughout the hiring process.

Awesome Motive Glassdoor Company Review
4.2 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Awesome Motive DE&I Review
4.4 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
CEO of Awesome Motive
Awesome Motive CEO photo
Kartik Mandaville
Approve of CEO

Average salary estimate

$195000 / YEARLY (est.)
min
max
$160000K
$230000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Awesome Motive logo

What it's like to work at Awesome Motive

Read Reviews
Similar Jobs
Photo of the Rise User
Posted 5 hours ago

Equal Parts seeks a Head of Customer Experience in Austin, TX to build the lifecycle, systems, and teams that own onboarding, servicing, and renewals across its acquisition-driven insurance platform.

Photo of the Rise User
Posted 4 hours ago

Alpaca Health is hiring a Customer Success Manager to partner closely with autism care providers to optimize operations, drive revenue growth, and improve clinical outcomes using its AI-enabled platform.

Photo of the Rise User
Posted 10 hours ago

Wellmark is hiring a seasoned Platform Engineer to design, build, and scale agentic AI platforms and infrastructure that enable autonomous, enterprise-grade AI workflows.

Photo of the Rise User
Pinterest Hybrid San Francisco, CA, US; Palo Alto, CA, US
Posted 4 hours ago

Lead cross-team engineering to build scalable catalog, integration, and AI-native merchant systems that improve onboarding, catalog health, and merchant growth at Pinterest.

Photo of the Rise User
PayPal Hybrid San Jose, California, United States of America
Posted 19 hours ago

Experienced backend-focused Staff Software Engineer needed to lead architecture and delivery of scalable Node.js/React services for PayPal's commerce platform.

Photo of the Rise User
Posted 20 hours ago

Senior frontend engineer to lead architecture and development of React/TypeScript platform UIs that surface and orchestrate machine identity workflows at scale for CyberArk.

Photo of the Rise User
SeatGeek Hybrid Remote - United States
Posted 15 hours ago

SeatGeek is looking for Software Engineers to design, build, and operate scalable services and user experiences for a high-traffic ticketing marketplace in a fully remote work environment.

Photo of the Rise User
QODE Hybrid No location specified
Posted 17 hours ago

Front-End React Developer role at Incedo in Austin focused on building responsive, high-performance React applications and reusable UI components.

Photo of the Rise User
FINRA Hybrid Rockville, MD (Job Posting)
Posted 15 hours ago

FINRA is hiring a Software Engineer in Rockville, MD to develop robust, maintainable software and support engineering and operational excellence across the SDLC in a hybrid environment.

Posted 16 hours ago

Temporal is hiring a Staff Software Engineer to lead the architecture and operation of internal builder tools and AI-driven agent platforms that improve developer flow and reliability across the organization.

Photo of the Rise User
SEIC Hybrid USA - MA - Boston
Posted 16 hours ago

SEI is hiring a Full Stack Software Engineer II to build cloud-native investment systems using .NET, React, TypeScript and AWS in a microservices architecture.

Photo of the Rise User

Senior technical leader sought to shape LinkedIn’s core infrastructure strategy and lead cross-team initiatives across networking, storage, and messaging at massive scale.

Photo of the Rise User
Syngenta Group Hybrid Slater, Iowa, United States
Posted 20 hours ago

Syngenta Seeds is hiring a Full-Stack Engineer to build scalable web applications that translate AI/ML capabilities into intuitive tools for growers and global users.

Photo of the Rise User
Posted 17 hours ago

Lead backend development for Bumble's Dating product by building scalable GCP-native services, driving projects end-to-end, and mentoring junior engineers.

Labcorp Hybrid Bloomfield CT
Posted 4 hours ago

Labcorp seeks an entry-level Software Engineer I in Bloomfield, CT to develop embedded and application-level software for laboratory robotic and automation systems.

SpringRole is the first professional reputation network powered by artificial intelligence and blockchain to eliminate fraud from user profiles. Because SpringRole is built on blockchain and uses smart contracts, it's able to verify work experienc...

735 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 13, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!