Job details

Senior Platform & Reliability Engineer

🧑🏼 💻 Senior Platform & Reliability Engineer

🎨 About OpenArt

OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We’re building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals, characters, and stories with unprecedented speed and imagination.

We believe the future of creativity is AI-native, and we're shaping that future.

🚀 Why Join OpenArt

Small team, massive surface area, senior engineers own real systems, notslices.
Ship at real scale, your work goes to millions of users, fast.
Founder-led engineering culture, both founders are technical and deeplyinvolved in product and architecture.
AI-native product, you’ll design how cutting-edge AI models are exposed asreal user experiences.
High ownership, low process, we value judgment, clarity, and speed overbureaucracy.
Senior Platform & Reliability Engineer 1
7-10X growth in revenue for the past 2 years. Now you’ll play a critical role inhelping the company scale to the next stage.

🎯 About the Role

We’re looking for a Senior Platform & Reliability Engineer to help design, scale, and improve the reliability of our infrastructure, from architectural decisions to hands-on implementation, observability, and cost optimization.

This is not a traditional ops or DevOps role. You’ll work across cloud infrastructure, distributed systems, backend services, and developer tooling, making pragmatic decisions that balance product velocity, system reliability, and cost efficiency—in a fast-moving, AI-native environment.

You’ll partner closely with product engineers to evolve the platform that powers OpenArt, contributing to key decisions around infrastructure architecture, improving multi-provider AI reliability, and helping us scale systems to millions of users—while raising the overall engineering bar.

🛠 What You’ll Do

Define and operationalize SLOs/SLIs across critical user journeys (generation, editing, payments/credits, uploads), and use them to guide prioritization and tradeoffs.
Participate in an on-call rotation and improve incident response (alert quality, run books, escalation paths), including leading blameless postmortems and driving follow-through on action items.
Improve system resilience at external boundaries (AI providers, storage, etc.),including timeouts, retries, circuit breakers, and fallback strategies. Build and maintain end-to-end observability (logs, metrics, traces, dashboards) so engineers can quickly understand “what broke” and “why.”
Strengthen deploy safety through CI/CD improvements, automated rollbacks, canary releases, and feature flag patterns.
Contribute to the evolution of our infrastructure architecture, helping evaluate when to extend serverless patterns vs. adopt containerized or more managed approaches as we scale.
Improve cost visibility and efficiency, including per-request cost attribution, caching strategies, and capacity planning.
Act as a strong technical contributor, helping improve engineering practices, tooling, and system design decisions across the team.

🧑 💻 What We’re Looking For

Core Requirements

5+ years building and operating production systems where reliability and scaling are important.
Strong software engineering skills — you can build and ship production code, not just configure infrastructure.
Experience with cloud-native systems (AWS or GCP), including serverless/event-driven architectures and at least one container-based approach (e.g., ECS/Fargate, Cloud Run, Kubernetes).
Solid understanding of observability and reliability practices: metrics, alerting, tracing, and incident response.
Experience designing resilient systems with external dependencies (timeouts, retries/backoff, idempotency, circuit breakers).
Ability to communicate technical tradeoffs clearly to engineers across different domains.
Comfortable operating in ambiguous, fast-moving environments and taking ownership of problems.
Nice to Have
Experience building internal platform abstractions (e.g., job orchestration, APIlayers, workflow systems) that improve team velocity.
Track record of improving reliability metrics (e.g., MTTR, SLO attainment, latency) or reducing infrastructure cost.
Experience working in a startup or high-growth environment, with broad ownership across systems.

⚙ Tech Stack You’ll Work With

GCP, Cloud Run, Modal, Upstash, Sentry, Amplitude, Firebase, Redis, React /Next.js, Node.js, TypeScript, Python, etc.

💰 Compensation

Competitive base salary and bonus program
Equity - meaningful ownership in what you build
High autonomy, high growth environment

🌍 Work Setup

Bay Area preferred (hybrid allowed)
Visa sponsorship available
We’ll consider remote

Senior Platform Engineer Reliability Engineer SRE GCP Cloud Run Kubernetes Serverless CI/CD Observability SLO Incident Response Python TypeScript Node.js Redis Firebase Sentry Amplitude Modal Upstash

Average salary estimate

$195000 / YEARLY (est.)

min

max

$160000K

$230000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Senior Staff Software Engineer, Performance (Tech Lead) — Veza

ServiceNow Hybrid Building A,B,C 2225 Lawson Lane, 95054 Santa Clara, California, United States

VIEW

Posted 22 hours ago

Inclusive & Diverse

Mission Driven

Rise from Within

Diversity of Opinions

Work/Life Harmony

Empathetic

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Conferences Stipend

Paid Time-Off

Maternity Leave

Equity

Lead performance and scalability for Veza’s Access Graph platform as a Senior Staff Software Engineer and technical lead, driving benchmarks, observability, and cross-team architectural decisions.

Full-Stack Developer (.NET based)

CapTech Consulting Hybrid Salt Lake City, UT, USA

VIEW

Posted 22 hours ago

CapTech is hiring a senior Full-Stack Developer (.NET) in Salt Lake City to deliver cloud-ready, API-driven enterprise applications and integrations across front-end and back-end stacks.

iOS Developer Intern

uhaul Hybrid Phoenix, Arizona

VIEW

Posted 5 hours ago

U-Haul Mobile is hiring an iOS Developer Intern to work with Swift and Xcode on customer-facing and internal apps, gaining hands-on experience across the full mobile development lifecycle.

Software Engineer I - Backend

CDW Hybrid Virtual - Illinois

VIEW

Posted 4 hours ago

CDW is hiring a remote Software Engineer I (Backend) to build and maintain Flask-based REST and GraphQL APIs on AWS while ensuring quality, performance, and secure production operations.

Principal System Integration Engineer (Enterprise Systems), Python (1064) – Department of Technology

City and County of San Francisco Hybrid 1 S Van Ness Ave, San Francisco, CA 94103, USA

VIEW

Posted 22 hours ago

Lead design and development of secure, high-availability APIs and enterprise integrations for San Francisco’s JUSTIS criminal justice data exchange as the Principal System Integration Engineer.

Staff Software Engineer, Backend

PayPal Hybrid San Jose, California, United States of America

VIEW

Posted 19 hours ago

Experienced backend-focused Staff Software Engineer needed to lead architecture and delivery of scalable Node.js/React services for PayPal's commerce platform.

Full Stack Engineer

Syngenta Group Hybrid Slater, Iowa, United States

VIEW

Posted 20 hours ago

Syngenta Seeds is hiring a Full-Stack Engineer to build scalable web applications that translate AI/ML capabilities into intuitive tools for growers and global users.

Frontend React Developer

QODE Hybrid No location specified

VIEW

Posted 17 hours ago

Front-End React Developer role at Incedo in Austin focused on building responsive, high-performance React applications and reusable UI components.

Remote Sr Director, Engineering

Jobgether Hybrid Pennsylvania

VIEW

Posted 13 hours ago

Senior Director of Engineering needed to drive AI-powered engineering practices and operational excellence across global development teams in a remote role based in Pennsylvania.

Distinguished AI Tech Lead, Initiatives

AVEVA Hybrid San Leandro, California, United States of America

VIEW

Posted 16 hours ago

Lead and architect enterprise-scale AI initiatives at AVEVA, translating cutting-edge AI research into production-ready architectures, repeatable patterns, and cross-functional delivery across industrial domains.

Principal Software Development Engineer / Senior Software Development Engineer - Full Stack

Workday Hybrid USA, WA, Seattle

VIEW

Posted 4 hours ago

Design and deliver full-stack, production-grade AI agent features at Workday—building scalable front-end and backend solutions that simplify HR and finance workflows for millions of users.

Distinguished AI Tech Lead, Frontier Applications

AVEVA Hybrid San Leandro, California, United States of America

VIEW

Posted 16 hours ago

AVEVA is hiring a Distinguished AI Tech Lead to shape and operationalize frontier AI capabilities across industrial products, bridging advanced research and production delivery.

Software Engineer, GTM

Parloa Hybrid Remotely in the USA

VIEW

Posted 17 hours ago

Design and build AI‑enabled internal systems and integrations to scale Parloa’s Go‑To‑Market operations using TypeScript, Python, and modern AI tooling.

O OpenArt AI

2 jobs

MATCH

Calculating your matching score...

FUNDING

Growth

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info