Platform.sh

Site Reliability Engineer

Reposted 20 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in Canada

Mid level

Remote

Hiring Remotely in Canada

Mid level

As a Site Reliability Engineer, improve infrastructure and automate tasks while collaborating with teams to enhance reliability across the application lifecycle.

The summary above was generated by AI

About Platform.sh

Platform.sh is Platform-as-a-Service (PaaS) that removes the complexities of cloud infrastructure management and optimizes development-to-production workflows, reducing the time it takes to build and deploy applications. Delivering efficiency, reliability, and security, giving development teams both control and peace of mind. Built for developers, by developers.

Adopted and loved by 16,000+ developers, 7,000 customers, and for nearly a decade Platform.sh has been providing innovative capabilities that serve as the launchpad for creative development teams’ out-of-the-box thinking.

We provide 24x7 support, managed cloud infrastructure, and automated security and compliance with an all-in-one PaaS. We give our customers complete control over their data by keeping applications secure and available around the clock.

Platformers are a remote, global workforce, and we thrive in a multicultural team. We are committed to open source and an open, welcoming environment. Our team spans the globe and the experience spectrum. What's our commonality, our cultural fabric? A curious spirit and a thirst for knowledge; an eagerness for innovative ideas and cultures. We believe we can build anything together in an environment that frees you to do your best work.

Bring your expertise and enthusiasm to our growing, global organization. Your contributions, collaboration, and unique point of view are recognized and valued here.

Impact of a Site Reliability Engineer

As a Site Reliability Engineer, you are a key part of our team’s transition to the Site Reliability Engineering (SRE) model, moving from traditional Cloud Operations to an automation-driven approach. This shift enhances system reliability, scalability, and efficiency, positioning SRE as a core function within the company.

Moreover, in this role, you focus on improving infrastructure, automating operational tasks, and streamlining processes. You work closely with developers, engineers, and product teams to ensure reliability is embedded throughout the application lifecycle.

As part of this transition, you also help optimize cloud-based systems, reduce manual work, and drive continuous improvements, playing a vital role in the organization’s overall success and long-term stability.

What to expect

Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.

What you bring

DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.

Where we hire

At Platform.sh, remote work isn't just a trend - it's our foundation. The freedom of remote work with the support of a diverse, global team has been our successful model for nearly a decade. Our culture celebrates flexibility and collaboration, and while we have team members in over 30 countries around the globe, we are currently focused on hiring for this role in Canada - Please note that this position requires participation in an on-call rotation aligned with the Pacific Time Zone (PT). Although we’re unable to provide visa sponsorship at this time, we welcome applications from all qualified candidates who are legally authorized to work in these countries.

How we hire

We know that a great hire won’t meet every requirement that we’ve outlined. If you can see yourself elevating the team, we want to hear your story. Few of us would be here had we not taken a chance.

You can expect 4 interviews on Google Meet to follow the order below. Should you successfully move through the entire process you will have the opportunity to meet with a variety of Platformers. Our goal is to ensure you can make the most informed decision on whether this role, and our culture aligns with what you’re looking for in your future working environment.

45 Minutes with Talent Acquisition
60 Minutes with Hiring Manager (Director, Site Reliability Engineering)
60 Minutes with Team (Site Reliability Engineer, Director, Site Reliability Engineering)
60 Minutes with Executive (Senior Director, Site Reliability Engineering)

All roles require background checks.

What we offer

💡 A product you can believe in - Join us in transforming how businesses build and manage web applications, driven making a positive impact as a proud B Corp.

🏆 An Award-Winning Workplace - We’ve been recognized by Forbes’ Top 30 Companies for Remote Jobs and France’s Best Workplaces for Women.

🗣️ A culture that values your voice - Join a flexible, open, and inclusive work environment where your voice is encouraged, and your ideas shape our growth and evolution.

🌎 A global team - Collaborate with colleagues from diverse backgrounds across the world, embracing different perspectives

🎉 Benefits and perks - Make the most of what matters to you

🏝 Flexible PTO

📈 Company stock options

🧠 Professional development budget

💻 Office equipment budget

💆‍♀️ Wellness budget

🧳 Annual team gatherings

🛜 Internet reimbursement

👶 Inclusive parental leave

✈️ Remote work travel program

You belong here

At Platform.sh, we celebrate diversity in all its forms and are committed to fostering an inclusive, equitable, and supportive workplace where everyone can thrive. We embrace and value different perspectives, backgrounds, and experiences, because they make us stronger as a team. Whoever you are, wherever you're from, and whatever path you've taken, you are welcome here. We encourage you to bring your whole self to work, connect with others, and share your passion.

If you need accommodations at any stage of our hiring process, please let us know. We're here to ensure an accessible and comfortable experience for you.

Top Skills

Ansible

AWS

Azure

Bash

Docker

Elk Stack

GCP

Grafana

Kubernetes

Prometheus

Python

Terraform

Similar Jobs

GitLab

Site Reliability Engineer

Yesterday

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As an Intermediate Site Reliability Engineer, you automate operations, manage PostgreSQL database reliability, handle incidents, and provide database expertise while designing scalable systems.

Top Skills: AnsibleChefGoKubernetesPostgresPuppetRubyTerraformVm

GitLab

Senior Site Reliability Engineer

5 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Design, implement, and maintain scalable infrastructure using GCP and AWS, automate operations, manage incident responses, and enhance monitoring systems.

Top Skills: AWSGCPGoGrafanaHashicorp VaultKubernetesPrometheusPulumiTerraform

Xsolla

Site Reliability Engineer

2 Days Ago

In-Office or Remote

Montréal, QC, CAN

Senior level

Gaming

As a Site Reliability Engineer, you will ensure system reliability, monitor for issues, resolve incidents, and collaborate with development teams to enhance operational stability.

Top Skills: BashDatadogDockerGitlabGoGrafanaHelmKubernetesLinuxPHPPrometheusPythonTerraform

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.