Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Reliability Engineer image - Rise Careers
Job details

Reliability Engineer

This role supports the U.S. Air Force Cloud One Architecture and Common Shared Services contract and currently has an opening for a Reliability Engineer. The Reliability Engineer is responsible for ensuring the availability, performance, scalability, and resiliency of mission‑critical systems. This role applies software engineering principles to infrastructure and operations, with a strong emphasis on automation, monitoring, incident response, and continuous reliability improvement. The reliability engineer serves as the bridge between development, operations, and platform teams to ensure production systems consistently meet defined service level objectives (SLOs) while supporting rapid, safe delivery of new capabilities.

 

 

Location: This position will be hybrid remote. Candidates will be required to work onsite as needed. Candidates preferred to be located near Hanscom AFB (Boston, MA).

System Reliability & Availability

  • Design, implement, and maintain highly available, fault-tolerant systems in cloud and hybrid environments
  • Define, measure, and report Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets
  • Identify reliability risks and implement mitigation strategies across the system lifecycle
  • Conduct capacity planning and performance modeling to ensure systems scale to meet demand

Monitoring, Observability & Alerting

  • Implement and manage monitoring, logging, and tracing solutions to provide full system observability
  • Define actionable alerting thresholds that minimize noise and enable rapid incident detection
  • Analyze trends and metrics to proactively identify potential reliability issues

Incident Response & Problem Management

  • Participate in on‑call rotations and lead incident response activities for production systems
  • Coordinate troubleshooting efforts across development, infrastructure, and security teams
  • Conduct post‑incident reviews (PIRs) and develop corrective and preventive action plans
  • Track recurring issues and ensure root causes are resolved

Automation & Engineering Excellence

  • Automate operational tasks to reduce manual intervention and operational risk
  • Develop scripts, tools, and services that improve system reliability and reduce mean time to recovery (MTTR)
  • Promote “automation over toil” and standardize operational workflows

Reliability‑Focused Engineering

  • Participate in architecture and design reviews with an emphasis on reliability, resiliency, and recoverability
  • Validate disaster recovery (DR) and business continuity plans; test failover mechanisms
  • Support chaos engineering, fault injection testing, and resilience validation where appropriate

Collaboration & Governance

  • Partner with DevOps, Platform, and Security teams to ensure reliability aligns with delivery and compliance objectives
  • Document system reliability standards, runbooks, and operational procedures
  • Support compliance and audit activities (e.g., FedRAMP, FISMA, internal operational controls)

 

Required Skills:

·       Bachelors and eight (8) years or more of experience; Masters and six (6) years or more of experience. Additional experience may be accepted in lieu of degree.  

·       Active Secret clearance at a minimum required to start  

·       US citizenship required 

·       Experience with cloud platforms (AWS, Azure, OCI, or GCP), including managed services

·       Experience with containerized environments (Docker, Kubernetes)

·       Familiarity with CI/CD pipelines and deployment automation

·       SLOs and error budgets

·       Capacity modeling and performance testing

·       Strong understanding of:

·       Distributed systems and high‑availability architectures

·       Linux/Windows system administration

·       Networking fundamentals (DNS, TCP/IP, load balancing)

·       Hands-on experience with:

·       Monitoring and observability tools (e.g., Prometheus, Grafana, ELK/Elastic, Datadog, Azure Monitor)

·       Infrastructure as Code (Terraform, ARM, CloudFormation)

·       Scripting or programming languages (Python, Bash, Go, PowerShell, or similar)

·       Experience supporting incident management and on‑call operations

 

Preferred Skills

  • Experience with USAF Cloud One or Platform 1. 
  • Experience with Zero Trust Architecture 
  • Cloud certifications in AWS, Azure, Google, or Oracle clouds 

SES provides a competitive salary and the following benefits:

  • Medical
  • Dental
  • Vision
  • AD&D
  • STD
  • LTD
  • Company paid Life Insurance
  • 401k with employer contribution
  • Paid Time Off
  • Pet Insurance

Average salary estimate

$145000 / YEARLY (est.)
min
max
$120000K
$170000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Systems Engineering Solutions Corporation logo

What it's like to work at Systems Engineering Solutions Corporation

Read Reviews
Similar Jobs
Photo of the Rise User

Make infrastructure resilient and scalable at Visa by building automation, database reliability tooling, and GenAI-powered engineering assistants on the Product Reliability Engineering team in Austin.

Photo of the Rise User

Lead design and development of secure, high-availability APIs and enterprise integrations for San Francisco’s JUSTIS criminal justice data exchange as the Principal System Integration Engineer.

Photo of the Rise User

Lead and mentor a software engineering team to design and deliver manufacturing software and tooling that enables production of next‑generation surgical robotics.

Photo of the Rise User
CoLab Software Hybrid North America, Remote
Posted 14 hours ago

Senior product-minded engineer needed to prototype, architect, and de-risk browser-based 2D/3D CAD and engineering-data systems for a remote-first AI platform used by major OEMs.

Photo of the Rise User
Posted 22 hours ago

Lead a global engineering organization to integrate AI-powered tooling, drive execution excellence, and shape product delivery strategies as the Senior Director of Engineering (remote).

Photo of the Rise User
Posted 13 hours ago

Chainguard is seeking a Staff Software Engineer to lead architecture and implementation of a scalable, secure Libraries Platform that automates builds, verification, and distribution of open-source packages (remote, full-time).

Photo of the Rise User
Posted 19 hours ago

CapTech is hiring a senior Full-Stack Developer (.NET) in Salt Lake City to deliver cloud-ready, API-driven enterprise applications and integrations across front-end and back-end stacks.

Photo of the Rise User
Dental Insurance
Disability Insurance
Flexible Spending Account (FSA)
Health Savings Account (HSA)
Vision Insurance
Family Medical Leave
Paid Holidays

Lead the design and implementation of LaunchDarkly's statistically rigorous, warehouse-native experimentation platform—building engines for hypothesis testing, adaptive bandit allocation, and large-scale analysis across customer data warehouses.

Photo of the Rise User
FINRA Hybrid Rockville, MD (Job Posting)
Posted 13 hours ago

FINRA is hiring a Software Engineer in Rockville, MD to develop robust, maintainable software and support engineering and operational excellence across the SDLC in a hybrid environment.

Photo of the Rise User

CSCI Consulting is seeking an experienced MuleSoft Integration Developer to design and implement secure, high-performance integrations and API-led connectivity for a major Federal modernization program.

Photo of the Rise User
ServiceNow Hybrid Building A,B,C 2225 Lawson Lane, 95054 Santa Clara, California, United States
Posted 19 hours ago
Inclusive & Diverse
Mission Driven
Rise from Within
Diversity of Opinions
Work/Life Harmony
Empathetic
Feedback Forward
Take Risks
Collaboration over Competition
Medical Insurance
Dental Insurance
Vision Insurance
Mental Health Resources
Life insurance
Disability Insurance
Health Savings Account (HSA)
Flexible Spending Account (FSA)
Conferences Stipend
Paid Time-Off
Maternity Leave
Equity

Lead performance and scalability for Veza’s Access Graph platform as a Senior Staff Software Engineer and technical lead, driving benchmarks, observability, and cross-team architectural decisions.

pubGENIUS Hybrid No location specified
Posted 15 hours ago

A senior, hands-on Principal Software Engineer is needed to own architecture, performance, and delivery across a high-revenue web platform, mobile app, and ML-driven ad-tech systems for a remote-first ad-tech agency/startup.

Photo of the Rise User

Experienced software engineer needed to develop and prototype NLP and LLM-based solutions that extract, structure, and automate aviation data for national airspace modernization.

SES is an industry leader in verification services with projects ranging from conformance with self-imposed sustainability standards to the functioning of national voluntary programs. Since 1998, SES has supported governmental and private clients ...

23 jobs
MATCH
Calculating your matching score...
FUNDING
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, hybrid
DATE POSTED
April 1, 2026
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!