Who we are
We scan and interpret the entire digital footprint of a business - from websites and social media to hidden online mentions - revealing who they really are and whether they pose a risk.
Global financial institutions and payment providers rely on us to stop fraudsters running shell companies, selling counterfeit goods, or laundering money.
The Role
We run dozens of AI agents in parallel — searching, scraping, querying, and reasoning over huge volumes of data within enterprise SLAs. As our Senior DevOps Engineer, you own the platform that makes that possible: the infrastructure, the pipelines, the observability, and the security posture that let a small team ship hard AI systems to banks, reliably and fast.
This is an end-to-end ownership role on a team with no layers. You will not be handed tickets. You will define how we build, deploy, scale, and secure everything, and you will do it alongside hands-on founders who are in the code every day.
Tech stack: Node.js, TypeScript, React, PostgreSQL, AWS, Kubernetes, Terraform, Temporal, LLMs, and advanced data-ingestion pipelines.
What We Do?
What used to take professional analyst teams 3 days, we do in 8 minutes with AI agents we build from scratch. You would own the platform those agents run on.
Ballerine catches fraud, scams, and bad actors before they get into the financial system. Banks across the world use us to decide whether a business is legitimate or a fraud — one of the most complex, manual jobs in finance, which we have turned into a fast, explainable, AI-driven workflow.
We are a small, senior engineering team building AI-native systems from scratch — backed by Y Combinator and Team8, a Mastercard-certified partner (1 of 6 worldwide), with paying enterprise customers across multiple regions. Early enough that this role is career-defining; real enough that you know it is not going anywhere.
What You'll Do
Platform and infrastructure (K8s + IaC)
- Architect and operate our Kubernetes platform on AWS — scaling, networking, cost, and reliability
- Own infrastructure as code (Terraform) so environments are reproducible, auditable, and fast to evolve
- Build the foundations that let the team provision and scale services without friction
AI agent platform — managing and scaling agents at volume
- Operate and scale the orchestration layer (Temporal) that runs dozens of agents in parallel
- Tune the platform for the unusual load profile of agent workloads — bursty, long-running, data-heavy, latency-sensitive
- Give engineers the primitives to deploy, version, observe, and roll back agents safely under enterprise SLAs
CI/CD and developer experience
- Own build and deploy pipelines end-to-end — fast, safe, boring releases
- Invest in DX as a first-class product: the team treats developer experience as leverage, and you set the bar
- Reduce the time from merged to in production and from idea to running experiment
Observability and reliability
- Build monitoring, alerting, and tracing that make production legible — for services and for agents
- Own incident response and the reliability practices that keep enterprise customers SLAs intact
- Turn incidents into systemic fixes, not repeated firefighting
Security and compliance
- Own secrets management, hardening, and the day-to-day security posture of the platform
- Support our compliance commitments (we are Mastercard-certified, operating in fintech — the bar is high)
- Build security into the pipeline so it is the default, not a gate
What You Bring
- Strong DevOps/platform/SRE experience at a company with a real engineering culture (FAANG, unicorn, or a well-established startup with high standards)
- Deep Kubernetes and AWS — you have architected, scaled, and debugged production clusters, not just deployed to them
- Infrastructure as code in your bones — Terraform or equivalent, with strong opinions on reproducibility and auditability
- CI/CD ownership — you have built pipelines that engineers trust and rarely think about
- Production observability and incident response at meaningful scale
- Comfort with and curiosity about AI/LLM workloads. We are an AI-native company; you should be using AI in how you work, and excited to operate the infrastructure agents run on
The three traits we hire for:
- Curiosity — you go deep, learn new tech beyond what is required, and want to understand why things work
- Smarts — you solve genuinely hard problems and can articulate the trade-offs
- Push — you drive work to done independently, and earn trust without being managed
Nice to have:
- Experience operating LLM/AI agent or data-pipeline infrastructure in production
- FinOps / cloud cost optimization at scale
- Security or compliance experience in a regulated environment (SOC 2, PCI, etc.)
- Temporal or other workflow-orchestration systems
- A track record of treating developer experience as a product
Why join us
