Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Senior Site Reliability Engineer image - Rise Careers
Job details

Senior Site Reliability Engineer

Who We Are

Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection of AI and open-source technology, we believe in an open future where AI innovation is limited only by imagination, not by access to resources. We're looking for forward-thinking individuals who share our passion for making AI universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary AI projects into reality.

As we prepare for growth after our Series A, our team — led by co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing.

About the Role

We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and security. As an aggregator of compute resources from hundreds of global suppliers, our SLOs, trust, and economic efficiency are product-critical. You'll be responsible for defining and maintaining service level objectives for job success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing secure rollout and rollback mechanisms that keep our platform running smoothly 24/7.

In this role, you'll establish the reliability standards that define customer trust in our platform, design monitoring and alerting systems that provide deep visibility into our infrastructure, build automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely with engineering teams to improve system resilience. You'll also focus on security and infrastructure hardening, ensuring strong isolation between tenants and suppliers, implementing key management systems, and building compliance frameworks. This is a high-impact position where your work directly influences our ability to deliver on our promise of affordable, accessible AI compute at scale.

Who You Are

  • Expert in site reliability engineering with proven experience defining, monitoring, and maintaining SLOs and SLAs for production systems

  • Strong background in capacity planning and management, including forecasting, resource allocation, and cost optimization for distributed systems

  • Experienced in incident response, on-call rotations, and post-mortem processes with a track record of reducing MTTR and improving system resilience

  • Deep knowledge of deployment systems including progressive rollouts, canary deployments, feature flags, and automated rollback mechanisms

  • Proficient in observability tools and practices including metrics, logging, tracing, and alerting systems (Prometheus, Grafana, ELK stack, or similar)

  • Strong understanding of infrastructure security including tenant isolation, workload isolation, network segmentation, and security hardening

  • Experience with secrets management, key management systems (KMS), certificate management, and secure credential rotation

  • Knowledge of compliance frameworks and security best practices for cloud platforms (SOC 2, ISO 27001, or similar)

  • Excellent problem-solving skills with ability to debug complex distributed systems issues under pressure

  • Strong automation mindset with experience using infrastructure-as-code, configuration management, and CI/CD pipelines

Preferred Qualifications

  • Experience operating GPU infrastructure, AI/ML platforms, or compute marketplaces at scale

  • Background in distributed systems, peer-to-peer networks, or decentralized infrastructure

  • Knowledge of multi-tenancy security patterns, container security, and runtime security tools

  • Experience with chaos engineering, fault injection, and resilience testing

  • Familiarity with cost optimization strategies for cloud infrastructure and GPU resources

  • Experience building and operating systems with demanding uptime requirements (99.9%+ SLAs)

  • Background at companies like AWS, Google Cloud, Azure, or fast-growing infrastructure startups

  • Contributions to open-source reliability, observability, or security tools

Hyperbolic is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Average salary estimate

$210000 / YEARLY (est.)
min
max
$160000K
$260000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Performance Bonus
Family Medical Leave
Paid Holidays

GoodLeap is hiring a Senior Full-Stack Software Engineer/Tech Lead to drive frontend-focused, full-stack initiatives and build scalable, AI-enabled finance platform features while mentoring teammates.

Photo of the Rise User
Posted 10 hours ago

Experienced software engineer needed to build and integrate scalable, secure payment and AI-enabled systems for Visa’s global platforms.

Photo of the Rise User

WHOOP is hiring a Senior Fullstack Software Engineer to develop scalable AI platform features and seamless member experiences from frontend interfaces to backend APIs.

Photo of the Rise User
Posted 21 hours ago

Senior frontend engineer to lead architecture and development of React/TypeScript platform UIs that surface and orchestrate machine identity workflows at scale for CyberArk.

Posted 17 hours ago

Lead modernization and secure identity/access efforts for enterprise applications at M&T Bank, driving cloud migrations, containerization, and engineering best practices.

Labcorp Hybrid Bloomfield CT
Posted 5 hours ago

Labcorp seeks an entry-level Software Engineer I in Bloomfield, CT to develop embedded and application-level software for laboratory robotic and automation systems.

Photo of the Rise User
PayPal Hybrid San Jose, California, United States of America
Posted 20 hours ago

Experienced backend-focused Staff Software Engineer needed to lead architecture and delivery of scalable Node.js/React services for PayPal's commerce platform.

Photo of the Rise User
Posted 12 hours ago

An established tech organization seeks a Senior Director of Engineering to lead AI-powered engineering practices, operational excellence, and global delivery for product-driven teams.

Photo of the Rise User
Posted 17 hours ago

Chainguard is seeking a Staff Software Engineer to lead architecture and implementation of a scalable, secure Libraries Platform that automates builds, verification, and distribution of open-source packages (remote, full-time).

Spacial is hiring a Summer 2026 Software Engineering Intern in Palo Alto to build website features, develop internal automation, and contribute to applied AI projects.

thomsonreuters Hybrid United States of America, Eagan, Minnesota
Posted 5 hours ago

Experienced software engineer needed to build and maintain cloud-based, customer-facing legal software using Java, JavaScript frameworks (e.g., Angular), and AWS in a hybrid Agile team environment.

Photo of the Rise User
Cover Whale Hybrid No location specified
Posted 23 hours ago

Lead and build the agentic AI platform that enables pods of engineers and AI agents to safely and reliably deliver production software at scale.

Photo of the Rise User

Make infrastructure resilient and scalable at Visa by building automation, database reliability tooling, and GenAI-powered engineering assistants on the Product Reliability Engineering team in Austin.

MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
No info
HQ LOCATION
No info
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
March 26, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!