Browse 53 exciting jobs hiring in Site Reliability now. Check out companies hiring such as Nabla, PlayStation Global, Awesome Motive in Shreveport, Memphis, New York.
Nabla seeks a senior SRE/Backend engineer to drive platform reliability and scalability for its clinical AI systems supporting clinicians across the US and EU.
Lead PlayStation's Service Reliability Engineering team to own global uptime, stability, and operational excellence for FTG's cloud gaming infrastructure.
Hammerhead is hiring a Site Reliability Engineer to establish and run the reliability function for an AI-driven power orchestration platform deployed across cloud and on-prem data centers.
Lead a distributed SRE team at LexisNexis Risk Solutions to design and operate secure, automated, cloud-native infrastructure and drive on‑prem-to‑cloud migrations using Terraform, Azure, and modern CI/CD patterns.
Homebot seeks a Senior DevOps Engineer to lead multi-cloud (AWS and GCP) infrastructure design, operation, and developer enablement for our platform.
Stitch Fix is hiring a Platform Engineer to enhance cloud-native infrastructure, developer tooling, and CI/CD workflows to improve developer experience across the company.
Lead the architecture and operation of production-scale GPU clusters at Andromeda, partnering with customers to maximize distributed training reliability and performance.
Anduril's Discovery team is hiring a Site Reliability Engineer to design and operate scalable, secure deployments that integrate cloud, robotics, and mesh networking for mission-critical systems.
Kochava is hiring a Senior Site Reliability Engineer to develop and operate scalable, highly available infrastructure and tooling across cloud and on-prem environments.
Anduril's Discovery team is hiring a DevOps Software Engineer to design and operate CI/CD, IaC, containerized deployments, and MLOps pipelines for high-impact autonomy and networking systems.
Experienced SRE/DBA skilled in SQL Server, system administration, and cloud operations to ensure high-availability and performance of Intelerad's medical imaging platforms.
ServiceNow seeks a Staff Site Reliability Engineer to drive performance troubleshooting, incident escalation, and availability improvements across its cloud platform while working directly with customers and engineering teams.
HomeVision is hiring an Associate Site Reliability Engineer to help scale its AWS/Terraform platform, improve reliability and observability, and support IT and product initiatives in a fully remote environment.
Lead site reliability and platform engineering efforts at WGU as a Senior Software Engineer, building scalable, cloud-aware systems that power the university's online learning platform.
Crusoe is hiring a Software Engineer to help design and scale highly available distributed systems and build platform tools that power sustainable AI infrastructure.
Medtronic is hiring a Principal Software Cloud Engineer to architect and implement cloud-native microservices for CRM Software at its Minneapolis site.
Workday Government is hiring an SRE-focused software engineer to operate, troubleshoot, and harden large-scale cloud services for U.S. federal customers, requiring U.S. citizenship and clearance eligibility.
Lead and grow an engineering team building scalable, secure enterprise infrastructure and backend systems for LinkedIn’s Mountain View hybrid environment.
IonQ is hiring a Senior Manager, Software Engineering to lead and grow the System Operations Software team responsible for building scalable, reliable software for quantum systems.
Lead ServiceNow CMDB and ETL engineering efforts at Visa to design, build, and operate reliable discovery, ingestion, and data pipelines supporting enterprise CMDB and ITOM capabilities.
Lead observability and SRE efforts for high-availability government digital services at Mighty Acorn, building monitoring, incident response practices, and mentoring engineers across an AWS-based stack.
Lead Site Reliability Engineer needed to own SLO-driven reliability, Infrastructure as Code, and observability for athenahealth's hybrid cloud infrastructure while mentoring SRE teams.
Weedmaps is hiring a remote Site Reliability Engineer to strengthen observability, CI/CD, and containerized production reliability for its cloud-native services.
Senior Technology Director needed to lead cloud, DevOps, and platform modernization initiatives for a large member-facing organization, driving strategy, engineering leadership, and secure scalable delivery.
Lead the design and automation of cloud-native AWS infrastructure and AI-assisted operational tooling to drive resilience and efficiency across DraftKings' real-time platform.
Lead the Consumer Lending domain's SRE efforts at Toyota Financial Services to drive observability, automation, and high availability for mission-critical applications.
Senior Software Engineer (remote) to develop and operate a full-stack observability platform for a high-growth SaaS company focused on reliability and user-centered solutions.
Sysdig is hiring a Senior Software Engineer for the Data Platform team to architect and implement scalable Go-based data pipelines and drive technical direction for cloud-scale telemetry and analytics.
Lead the design and operation of secure, scalable cloud infrastructure for Anduril's Corporate Technology team as a Senior Site Reliability Engineer focused on reliability, automation, and observability.
Experienced reliability engineer needed to drive automation, observability, incident response, and SLO-driven operations for mission-critical cloud and hybrid systems supporting a U.S. Air Force program.
Lead production reliability efforts for Anduril's Lattice platform by building observability, automation, and scalable infrastructure solutions that keep mission-critical systems operational 24/7.
Work on core software and infrastructure at Dimensional to shape scalable, reliable systems that power general-purpose robotics.
Bluefish seeks a Senior Data Acquisition Engineer to design, operate, and scale production-grade web scraping and ingestion systems that power AI-driven marketing insights.
Lead reliability and security for a distributed GPU marketplace, driving SLOs, incident response, capacity automation, and secure rollouts to ensure 24/7 platform availability.
Cortex, a Series C engineering-operations platform, is hiring a Senior Backend Software Engineer to build scalable, reliable backend systems that power developer productivity for enterprise customers.
Pismo (part of Visa) is hiring a Senior Network Platform SRE to design, automate, and operate secure, resilient hybrid and multi-cloud network topologies with a focus on Azure.
Lead the Validation Engineering organization to design and operate self-service, policy-driven validation and reliability platforms that enable safe, high-velocity production changes at scale for LinkedIn.
Provide platform reliability and incident ownership for DSN's AWS-based customer services, driving operational improvements and cross-team coordination.
CyberArk is hiring a Senior Production Engineer to architect and operate highly available, secure cloud infrastructure and CI/CD pipelines for its machine identity security platform.
Senior Site Reliability Engineer (Azure) to design and deliver production-ready, scalable Azure infrastructure and automation for a growing distributed systems platform.
Senior Systems Engineer (DSP) to engineer and operate highly available, large-scale infrastructure supporting Basis’ DSP platform across cloud and on-prem environments.
Lead platform reliability initiatives as a Senior Technical Product Manager, coordinating SRE, DevOps, and DBRE efforts to improve uptime, incident response, and release confidence for a fast-growing SaaS platform.
MLabs is seeking a Senior Site Reliability Engineer to design and operate secure, scalable Azure infrastructure and automation for an enterprise distributed systems platform.
DriveWarealth seeks a Senior Site Reliability Engineer to build automation, observability, and resilient cloud-native platforms that support global brokerage operations.
Senior Backend Engineer for Commure's RCM team to build scalable, production-grade Python services that transform revenue cycle management for healthcare providers.
Lead the design and implementation of Kubernetes-based platform primitives and scaling automation to enable secure, deterministic, and horizontally scalable Chainlink node infrastructure across internal and decentralized environments.
Senior individual contributor SRE role at Pismo (a Visa company) to lead architecture, reliability, and operational excellence across cloud-native payment platforms in a hybrid Austin role.
Lead CI/CD and infrastructure automation efforts as a Staff Site Reliability Engineer at Pismo (Visa) to strengthen platform resilience and mentor engineering teams.
MongoDB seeks a Staff Technical Program Manager to lead SRE-focused platform programs, improving reliability, launch readiness, and cross-team coordination across global engineering organizations.
Lead the Site Reliability Engineering efforts for NG911 and other mission-critical systems, driving HA architecture, automation, and incident excellence at Motorola Solutions in Chicago.
Below 50k*
0
|
50k-100k*
2
|
Over 100k*
30
|