Platform.sh Logo

Platform.sh

Site Reliability Engineer

Reposted 20 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in Canada
Mid level
Remote
Hiring Remotely in Canada
Mid level
As a Site Reliability Engineer, improve infrastructure and automate tasks while collaborating with teams to enhance reliability across the application lifecycle.
The summary above was generated by AI
About Platform.sh

Platform.sh is Platform-as-a-Service (PaaS) that removes the complexities of cloud infrastructure management and optimizes development-to-production workflows, reducing the time it takes to build and deploy applications. Delivering efficiency, reliability, and security, giving development teams both control and peace of mind. Built for developers, by developers.

Adopted and loved by 16,000+ developers, 7,000 customers, and for nearly a decade Platform.sh has been providing innovative capabilities that serve as the launchpad for creative development teams’ out-of-the-box thinking.

We provide 24x7 support, managed cloud infrastructure, and automated security and compliance with an all-in-one PaaS. We give our customers complete control over their data by keeping applications secure and available around the clock.

Platformers are a remote, global workforce, and we thrive in a multicultural team. We are committed to open source and an open, welcoming environment. Our team spans the globe and the experience spectrum. What's our commonality, our cultural fabric? A curious spirit and a thirst for knowledge; an eagerness for innovative ideas and cultures. We believe we can build anything together in an environment that frees you to do your best work.

Bring your expertise and enthusiasm to our growing, global organization. Your contributions, collaboration, and unique point of view are recognized and valued here.

Impact of a Site Reliability Engineer

As a Site Reliability Engineer, you are a key part of our team’s transition to the Site Reliability Engineering (SRE) model, moving from traditional Cloud Operations to an automation-driven approach. This shift enhances system reliability, scalability, and efficiency, positioning SRE as a core function within the company.

Moreover, in this role, you focus on improving infrastructure, automating operational tasks, and streamlining processes. You work closely with developers, engineers, and product teams to ensure reliability is embedded throughout the application lifecycle.

As part of this transition, you also help optimize cloud-based systems, reduce manual work, and drive continuous improvements, playing a vital role in the organization’s overall success and long-term stability.

What to expect
  • Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
  • Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
  • Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
  • Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
  • Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
  • Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
  • Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.
What you bring
  • DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
  • Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
  • Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
  • Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
  • Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.
Where we hire

At Platform.sh, remote work isn't just a trend - it's our foundation. The freedom of remote work with the support of a diverse, global team has been our successful model for nearly a decade. Our culture celebrates flexibility and collaboration, and while we have team members in over 30 countries around the globe, we are currently focused on hiring for this role in Canada - Please note that this position requires participation in an on-call rotation aligned with the Pacific Time Zone (PT). Although we’re unable to provide visa sponsorship at this time, we welcome applications from all qualified candidates who are legally authorized to work in these countries. 

How we hire

We know that a great hire won’t meet every requirement that we’ve outlined. If you can see yourself elevating the team, we want to hear your story. Few of us would be here had we not taken a chance.

You can expect 4 interviews on Google Meet to follow the order below. Should you successfully move through the entire process you will have the opportunity to meet with a variety of Platformers. Our goal is to ensure you can make the most informed decision on whether this role, and our culture aligns with what you’re looking for in your future working environment. 

  1. 45 Minutes with Talent Acquisition 
  2. 60 Minutes with Hiring Manager (Director, Site Reliability Engineering)
  3. 60 Minutes with Team (Site Reliability Engineer, Director, Site Reliability Engineering)
  4. 60 Minutes with Executive (Senior Director, Site Reliability Engineering)

All roles require background checks.

What we offer

💡 A product you can believe in - Join us in transforming how businesses build and manage web applications, driven making a positive impact as a proud B Corp.

🏆 An Award-Winning Workplace - We’ve been recognized by Forbes’ Top 30 Companies for Remote Jobs and France’s Best Workplaces for Women.

🗣️ A culture that values your voice - Join a flexible, open, and inclusive work environment where your voice is encouraged, and your ideas shape our growth and evolution.

🌎 A global team - Collaborate with colleagues from diverse backgrounds across the world, embracing different perspectives

🎉 Benefits and perks - Make the most of what matters to you

🏝 Flexible PTO

📈 Company stock options

🧠 Professional development budget

💻 Office equipment budget

💆‍♀️ Wellness budget

🧳 Annual team gatherings

🛜 Internet reimbursement

👶 Inclusive parental leave

✈️ Remote work travel program

You belong here

At Platform.sh, we celebrate diversity in all its forms and are committed to fostering an inclusive, equitable, and supportive workplace where everyone can thrive. We embrace and value different perspectives, backgrounds, and experiences, because they make us stronger as a team. Whoever you are, wherever you're from, and whatever path you've taken, you are welcome here. We encourage you to bring your whole self to work, connect with others, and share your passion.

If you need accommodations at any stage of our hiring process, please let us know. We're here to ensure an accessible and comfortable experience for you.

Top Skills

Ansible
AWS
Azure
Bash
Docker
Elk Stack
GCP
Go
Grafana
Kubernetes
Prometheus
Python
Terraform

Similar Jobs

Yesterday
Easy Apply
Remote
2 Locations
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer, you automate operations, manage PostgreSQL database reliability, handle incidents, and provide database expertise while designing scalable systems.
Top Skills: AnsibleChefGoKubernetesPostgresPuppetRubyTerraformVm
5 Days Ago
Easy Apply
Remote
Canada
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
Design, implement, and maintain scalable infrastructure using GCP and AWS, automate operations, manage incident responses, and enhance monitoring systems.
Top Skills: AWSGCPGoGrafanaHashicorp VaultKubernetesPrometheusPulumiTerraform
2 Days Ago
In-Office or Remote
Montréal, QC, CAN
Senior level
Senior level
Gaming
As a Site Reliability Engineer, you will ensure system reliability, monitor for issues, resolve incidents, and collaborate with development teams to enhance operational stability.
Top Skills: BashDatadogDockerGitlabGoGrafanaHelmKubernetesLinuxPHPPrometheusPythonTerraform

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account