Sunset is building the data layer for real-world AI training. We work with frontier labs to turn messy, multi-modal enterprise data into the highest-quality training data on the market — sourced from the hundreds of venture-backed startups we've helped wind down.
We're a fast-growing team based in-person in Dumbo, Brooklyn. Backed by Floodgate, Afore Capital, Hustle Fund, and incredible entrepreneurs.
The Role
As a Data Engineer at Sunset, you'll own the pipeline that turns raw, chaotic enterprise data into the highest-quality training data on the market. One of our core technical challenges is entity resolution and de-identification across different sources and modalities. An even deeper challenge is understanding the node structures and linkages well enough to effectively reconstruct the business world this data comes from. All of this happens on sensitive data, which means security and privacy aren't a separate workstream but are built into every pipeline, system, and decision we make.
What You'll Work On
You'll own problems end-to-end. Some examples of what you might tackle in your first 90 days:
Designing the de-identification layer that replaces PII with stable pseudonyms while preserving every relationship across every source
Building coreference resolution across Slack threads, email chains, and Linear comments so that "me," "him," and first-name mentions all resolve to the right canonical entity
Hardening how we ingest, store, and process sensitive client data — from encryption and access controls to audit trails and isolation boundaries
Extending our entity resolution pipeline to handle new modalities — think audio, video, design files, or embedded references inside documents
You Might Be a Fit If
You are a product minded engineer and have shipped data pipelines at scale
You have strong Python and are comfortable across NER, record linkage, and coreference
You take security and privacy seriously and have built systems where getting it wrong wasn't an option
You want to own a hard, ambiguous problem end-to-end rather than wait for a PRD
AI is deeply integrated into your workflow and life
This Role Might Not Be a Fit If
You want to work remote or hybrid — we're in-person 5 days/week in Dumbo
You want to do novel ML research — this role is applied, not research
You prefer long planning cycles or narrow ownership
Our Stack
Python, Postgres, Redis, AWS. We pick tools based on the problem, not the other way around.
Compensation & Benefits
$180K–$280K base + meaningful equity
100% covered medical, dental, and vision
Unlimited PTO
$500 in-office setup allowance
Intro Chat (20 min) – mutual fit and interests
Technical Session (1hr) – collaborative problem-solving
Onsite (2–3 hrs) – product deep dive, system design, meet the team
Quick references → Offer
If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.
Experienced Principal Data Engineer needed to design and develop large-scale, cloud-native data and ad-tech systems leveraging serverless, event-driven architectures and AI to power NBCUniversal's audience and advertising products.
Gartner is hiring a Data Engineer - Production Support to manage daily Azure-based data warehouse operations, ensuring high data quality and stable ETL/ELT processes.
Lead MediaRadar’s data delivery lifecycle as a Data Engineering Manager, managing distributed engineers and delivering scalable ETL pipelines across a modern cloud stack.
The Red Sox are hiring a Data Engineer to maintain and enhance their GCP-based data architecture, building ETL pipelines and monitoring to support business teams across the organization.
Senior analytics leader needed to build AI-enabled analytics, establish company-wide metrics and governance, and turn data into actionable recommendations across Sur La Table's commerce and operations functions.
Lead Data Engineer to guide Data Operations and Analytics Engineering, ensuring a reliable Databricks lakehouse and high-quality analytics that power OneOncology’s mission.
NIQ is hiring an experienced Commercial Data Manager to set data standards, drive data quality, and govern sales and product data that underpin revenue operations and decision-making.
Founded in 1898, Sunset Magazine has long covered all aspects of life in the Western United States, focusing in particular on travel, food & drink, home design, and gardening. Based in the Los Angeles area, Sunset is owned by the private equity fi...
1 jobs