Overstory

Senior Site Reliability Engineer

Reposted 4 Days Ago

Easy Apply

Remote

11 Locations

Senior level

Easy Apply

Remote

11 Locations

Senior level

As a Senior Site Reliability Engineer, you'll manage GCP infrastructure, improve incident processes, develop observability platforms, and advocate for reliability best practices.

The summary above was generated by AI

The climate crisis is the defining challenge of our time—but it’s also the greatest opportunity for innovation, and a challenge we’re proud to take on. At Overstory, we’re harnessing cutting-edge technology to enable a resilient electrical grid that keeps communities thriving as our world changes.

The grid is the backbone of life as we know it. It powers hospitals, keeps food fresh, and ensures communities stay connected. But extreme weather, aging infrastructure, and growing wildfire risks are putting this critical system under pressure. All of this combined makes the electric utility industry the greatest opportunity for tackling climate change.

One of the leading causes of catastrophic wildfires and power outages? Trees and brush coming into contact with power lines.

That’s where we help. At Overstory, we use AI and advanced satellite imagery to pinpoint and prioritize vegetation risks before they materialize. By giving utilities critical analysis on those risks, we’re helping prevent outages, reduce wildfire risks, and accelerate the transition to a safer, more resilient grid.

Our team spans the Americas and Europe, and we work with utility partners across the Americas and beyond. We’re outdoor enthusiasts, musicians, artists, athletes, parents, and adventurers—15 nationalities strong and growing. What unites us is a passion for solving complex problems, a commitment to climate action, and the belief that technology should be a force for good.

Join us to help us build a more resilient world together.

The role

As a Site Reliability Engineer, you will contribute to the evolution of the strategic management of our GCP infrastructure, and of DevOps practices like incident management, SLOs and error budgets. You will champion observability as a way to improve mean time to recover and use DORA metrics to help the Product & Engineering team to get better at creating amazing products, and help other teams to optimize the use of GCP and manage cost.

What you’ll do

Design and evolve Overstory’s cloud infrastructure to support the company’s scaling needs, laying the foundation for performance, security, and maintenance.
Build tooling and automation that promote team autonomy while ensuring operational excellence.
Advance our observability platform to support long-term insights, meaningful alerting and improved ease of use for the engineering teams.
Build visibility into infra costs to raise awareness across engineering and empower teams to make cost-aware decisions.
Champion reliability best practices by shaping incident processes, defining SLOs, and fostering a culture of ownership and continuous improvement.

About you

You are able to prioritize collaboratively between tactical problems and strategic direction.
You are comfortable and effective working in a terminal in a Unix-based environment
You are confident in driving Infrastructure-As-Code principles.
You have experience working with any of the major Cloud Providers.
You have strong communication skills and are comfortable expressing your ideas to multiple different audiences
You are proactive with a positive attitude, well organised, and adept at managing competing deadlines and priorities
You are comfortable with and excited by a fast-paced and often changing environment, eager to solve new problems and learn new skills in order to succeed
You have a self-starter mindset; you proactively identify issues and opportunities and tackle them without being told to do so
Teamwork is at your core, and you like to help others grow and succeed.

Nice-to-haves

You can demonstrate experience scaling large distributed architectures.
You have experience in working in a remote-first environment.
You have worked with satellite data and/or imagery.
You have prior experience with Kubernetes.

What you get

To be part of truly mission-driven work that reduces wildfires, protects earth’s natural resources and helps solve our climate crisis.
Flexible working environment with a lot of autonomy. We build our work days around our lives, not the other way around.
Other benefits like a remote working budget, an educational budget and time to develop new skills.
To be surrounded by an excellent, vibrant, smart team who have each other's back and believe in a culture of openness, tolerance and respect.
Equity and a competitive salary.

About our team

We are a group of 100 people from all over the world. Fifteen nationalities are represented in our team. We work remotely from eleven different countries and we are looking for candidates that are also living and working in one of these countries: United States, the Netherlands, United Kingdom, Ireland, Estonia, Portugal, France, Sweden, Denmark, Switzerland, and Canada. We meet up once a year in-person for our unforgettable team gathering event. We also offer the option to occasionally meet up for in-person collaboration.

Diversity & Inclusion

We place enormous value on diversity and inclusion and strive to continually bring in people of all genders, races, creeds, ethnicities, abilities and backgrounds. We believe that the best ideas emerge when people with different perspectives and approaches work together on a problem.

We’re always looking to diversify our team further, but we’re proud of the fact that four out of the nine people on our leadership team are female, 46% of the overall team are female and 20% of the team are people of color. Our team speaks fifteen languages: English, Dutch, French, Spanish, German, Italian, Portuguese, Russian, Luxembourgish, Lithuanian, Bulgarian, Cantonese, Estonian, Danish and Korean.

Our valuesTackling the climate crisis is our greatest mission.

We act with urgency.

Our curiosity fuels our growth.

We recognize that change is constant, and we find joy and power in exploration.

We’re rooted in diversity.

Just as ecosystems need biodiversity to thrive, our resiliency comes from our differences.

We care for each other.

We love the power of machines but we nurture each other as humans.

Trust is fundamental.

We assume the best in everyone, and we share ideas openly so that we have a positive impact.

_________________________________

Use of AI in Our Hiring Process

We sometimes use AI tools to support parts of our hiring process, such as helping us manage applications more efficiently or ensuring job descriptions are clear and inclusive. But don’t worry, all hiring decisions are always made by people, not machines. Any data processed by AI is handled securely in line with GDPR and our Privacy Notice.

Top Skills

GCP

Infrastructure-As-Code

Kubernetes

Unix

Similar Jobs

Red Hat

Senior Site Reliability Engineer

4 Days Ago

Remote

Senior level

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation

The Senior Site Reliability Engineer will enhance Red Hat OpenShift's reliability and scalability, manage Linux servers, and drive incident responses, mentoring peers, and implementing efficient workflows through automation.

Top Skills: AnsibleChefGoJavaKubernetesPrometheusPuppetPythonRed Hat OpenshiftTcp/Ip Networking

Broadridge

Senior Site Reliability Engineer

13 Hours Ago

In-Office or Remote

Senior level

Fintech • Financial Services

Design, build, automate, and support reliable infrastructure; translate business requirements into technical designs balancing availability, performance, scale, and cost; automate deployments and operations; track SLIs/SLOs; troubleshoot production incidents; collaborate across teams and mentor junior engineers.

Top Skills: AnsibleAWSAzureBladelogicChefJenkinsLinuxPerlPowershellShellTerraformWindows

Axon

Senior Site Reliability Engineer

5 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Artificial Intelligence • Cloud • Social Impact • Software • Wearables

Senior SRE focused on building cloud-native platforms, testable automation, and reliability tooling. Partner with Identity and Security to strengthen authentication/authorization, Okta integrations, and compliance. Design tests, write maintainable code (Go/Python), and improve observability and operational practices.

Top Skills: AksApmAWSAzureC#Ci/CdEksGoIacInfrastructure As CodeJavaKubernetesLoggingMetricsObservability ToolsOidcOktaPythonSAMLSecrets ManagementTracing

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.