Alpaca Jobs

Site Reliability Engineer

Alpaca

Site Reliability Engineer

Reposted 12 Days Ago

Remote

Hiring Remotely in Ireland, IRL

Mid level

Remote

Hiring Remotely in Ireland, IRL

Mid level

Operate and improve Alpaca's production infrastructure: on-call incident response, define SLIs/SLOs, enhance observability, ship infrastructure as code via GitOps, and strengthen PostgreSQL reliability (performance, migrations, HA/DR). Mentor teams on reliability and database fundamentals.

The summary above was generated by AI

Who We Are:

Alpaca is a US-headquartered, global leader in agent-first brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more.
Amongst our subsidiaries, Alpaca is a licensed financial services company, serving hundreds of financial institutions across 40 countries with our institutional-grade APIs. This includes broker-dealers, investment advisors, wealth managers, hedge funds, and crypto exchanges, totalling over 10 million brokerage accounts.
Our global team is a diverse group of experienced engineers, traders, and brokerage professionals who are working to achieve our mission of opening financial services to everyone on the planet. We're deeply committed to open-source contributions and fostering a vibrant community, continuously enhancing our award-winning, developer-friendly API and the robust infrastructure behind it.
Alpaca is proudly backed by $400 million in funding from top-tier global investors including Portage Ventures, Spark Capital, Tribe Capital, Social Leverage, Horizons Ventures, Opera Tech Ventures, SBI Group, Derayah Financial, Unbound, Peak XV, Elefund, and Y Combinator.
Our Team Members:

We're a dynamic team of 400+ globally distributed members who thrive working from our favorite places around the world, with teammates spanning the USA, Canada, Japan, Hungary, Nigeria, Brazil, the UK, and beyond!
We're searching for passionate individuals eager to contribute to Alpaca's rapid growth. If you align with our core values—Stay Curious, Have Empathy, and Be Accountable—and are ready to make a significant impact, we encourage you to apply.

Your Role:

As a Site Reliability Engineer at Alpaca, you'll help keep our brokerage platform reliable, observable, and operable as we grow - working across our cloud infrastructure, Kubernetes platform, observability stack, messaging layer, and data layer. We're especially interested in candidates with strong PostgreSQL fundamentals who'd like to grow into deeper ownership of our database reliability posture: PostgreSQL sits on the trading-critical path, and we want this person to spend a meaningful share of their time leveling it up while still being a well-rounded SRE the rest of the week.

Things You Get To Do

Operate production day-to-day - oncall, incident response, postmortems, and the follow-ups that actually close the loop.
Own reliability practice - define and refine SLIs/SLOs and error budgets, and help product teams live within them.
Strengthen our observability across metrics, logs, traces, and alerting.
Ship infrastructure through code in a GitOps workflow - cloud resources and Kubernetes workloads alike.
Look after PostgreSQL: performance tuning, schema and migration review, online migrations on large tables, HA/DR, and CDC pipelines.
Mentor engineers on reliability and database fundamentals through code review, design review, and pairing.

Who You Are (must-haves)

4+ years in SRE, DevOps, Platform/Infrastructure, or backend engineering with significant production operations ownership.
Hands-on experience operating production services on Kubernetes, and shipping infrastructure as code in a GitOps workflow.
Solid working knowledge of PostgreSQL in production — query plans, pg_stat_*, indexing and schema trade-offs, and what a safe online migration looks like on a non-trivial table.
Cloud networking fundamentals (VPCs, routing, L4/L7 load balancing, DNS, TLS) and comfort debugging cross-service connectivity.
Comfortable with a modern observability stack and proficient with Linux at the operator level.
Practiced in incident response - calm under pressure, structured debugging, postmortems that drive change.
At least working proficiency in Go or Python, plus strong written and verbal communication.
Genuine interest in databases and in growing your PostgreSQL/DBA expertise.

Who You Might Be (Nice-to-Haves):

Deeper PostgreSQL experience: large clusters at OLTP load, online migrations on big tables, HA/DR ownership, connection pooling at scale, or change-data-capture pipelines.
Experience with typed SQL access layers in Go (e.g. pgx, gorm, sqlc).
Production experience with messaging systems at scale (e.g. RabbitMQ, Kafka, Redpanda).
Security & compliance experience in a regulated environment (SOC 2, secrets management, audit logging).
Familiarity with trading, brokerage, or other regulated fintech domains.

How We Take Care of You:

Competitive Salary & Stock Options
Health Benefits
New Hire Home-Office Setup: One-time USD $500
Monthly Stipend: USD $150 per month via a Brex Card

Alpaca is proud to be an equal opportunity workplace dedicated to pursuing and hiring a diverse workforce.

Recruitment Privacy Policy

Similar Jobs

Sporty Group

Site Reliability Engineer

8 Days Ago

Remote

Ireland, IRL

Mid level

Digital Media • Fintech • Gaming • Sports

Manage and improve cloud infrastructure and Kubernetes platforms (EKS) using GitOps. Own on-call rotations, incident response, SLIs/SLOs, observability stacks, alerting, and automation. Enable rapid multi-country deployments, collaborate with security audits, and mentor junior engineers.

Top Skills: AlertmanagerApache RocketmqArgocdAurora MysqlAutomqAws CloudfrontAws CloudwatchAws Ec2Aws LambdaBashCdnCiliumCloudflareDockerEbpfEbsEksElasticacheGithub ActionsGitopsGoGrafanaGrafana FaroHelmHttp CacheJavaJavaScriptJenkinsKafkaKongKubernetesLokiMimirMongoDBMySQLNginxNode.jsOpentelemetryPostgresPrometheusPyroscopePythonRedisRedis/MemcachedRustS3Spring BootTempoTerraformValkeyVectorVpc

Zencoder

Senior Engineer

10 Days Ago

In-Office or Remote

Ireland, IRL

Senior level

Artificial Intelligence • Information Technology • Software

Design, build, and operate Zencoder's cloud and Kubernetes infrastructure to improve reliability, security, scalability, cost efficiency and developer tooling. Automate operations, enhance CI/CD and GitOps, operate data systems (OpenSearch/Postgres), improve observability and incident response, and prepare the platform for growth and AI workloads.

Top Skills: AutoscalingAWSCdnCi/CdDnsElasticsearchFirewallsGCPGitopsGkeIamInfrastructure-As-CodeKubernetesLoad BalancingObservabilityOpensearchPostgresService IdentitySglangVllmVpcWaf

Ocient

Site Reliability Engineer

9 Days Ago

Remote

Ireland, IRL

Mid level

Big Data

Maintain and expand Ocient's hosted data warehouse services with a focus on high availability, performance, observability, automation, security, and incident management. Build monitoring, logging, alerting, CI/CD, and automate Linux server deployments while supporting backup, DR, and test infrastructure.

Top Skills: AnsibleAWSAzureBashChefDnsDynatraceElk StackFirewallGCPGitGitlab CiGrafanaGraylogInfluxdbJenkinsKvmLinuxPrometheusProxmoxPuppetPythonSQLVMwareVpnZabbix

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.