Cabify

Site Reliability Engineer III

Reposted 2 Days Ago

Be an Early Applicant

In-Office or Remote

2 Locations

Mid level

In-Office or Remote

2 Locations

Mid level

As a Site Reliability Engineer, you'll enhance infrastructure, improve service reliability, collaborate with teams on tooling, and support internal customers and practices.

The summary above was generated by AI

Do you want to change the world? At Cabify, that’s what we’re doing. We aim to make cities better places to live by improving mobility for the people living in them, connecting riders to drivers, providing mobility alternatives such as scooters and mopeds and many others to come, all at the touch of a button. Maybe one day cities will be places where nobody needs a private car. But we’ve still got a long way to go… Fancy joining us?

Our Product & Engineering teams are both based in Madrid, with a strong remote culture, and include an eclectic bunch of awesome people from different backgrounds like Ruby, Go, Elixir, JavaScript, and Python.

Right now, we’re working on some greenfield projects with a solid set of product ideas lined up ready for innovative engineers to tackle. And of course, we have big plans to take over the taxi app service industry!

Site Reliability Engineers at Cabify work on improving all aspects of our platform and have an impact across the whole organisation. They are a blend of systems engineers and software developers who solve scalability issues with software and implement the best production engineering and security practices.

Check our Public Tech Handbook >https://cabify.tech/handbook/< to know a bit more about us!

As a Site Reliability Engineer, you will be:

Evolving our infrastructure platform building self-service components that will be used by all the engineering team and by millions of users around the world.
Working closely with our Product and Infrastructure teams to architecture and develop world-class infrastructure components.
Designing and implementing tooling to improve the availability, scalability, observability and latency of our services, which are used by internal customers to deploy and operate their services.
Increasing reliability awareness with other teams, helping with the adoption of reliability principles and reviewing observability implementations or software architectures.
Defining SLIs, SLOs and SLAs as part of the services' lifecycle.
Sharing an on-call schedule for the platform services you own.
Solving problems in our highly available platform together with other teams, then build automations to prevent incidents from happening again.
Participating in our recruiting process to help grow our engineering team.

You may be a fit for this role if you:

Think Unix, you know the networking stack, the OSI model, containers (and schedulers), and you know your way around monitoring, logging and the CAP theorem (bonus!).
Have strong programming skills in at least one language, and know your way around a few more or can learn them if the opportunity arises.
Automate yourself out of everything by nature, making machines do the toil.
Communicate effectively and asynchronously.
Care about the things that affect the company, your team, and yourself.
Embrace diversity and humbleness (and a bit of trolling).
Prefer taking iterative action over waiting for things to happen or to be perfect.
Strongly favor simplicity over complexity. Ie, adhering to the KISS principle.
Have a sense for identifying, exploiting and elevating bottlenecks.
Are not afraid of expressing yourself in English. We aren't expecting you to have the Queen's accent, but you'll be part of an international team and we communicate in English, so you should be comfortable with that.
Enjoy herding cats and shaving yaks. Ie, being a great influence to other product teams and teaching them best practices. As well as analyzing and simplifying our setup.

Projects you could work on:

Helping us iterate on and improve our kubernetes setup (AWS EKS).
Iterate our networking layer to implement network policies, service mesh, and more…
Evolving our time-series monitoring platform (Cortex), in order to provide a first-class service to all of our engineering teams.
Help grow our adoption of distributed tracing (OTLP + Tempo), with the goal of providing request latency observability across microservices (as a service).
Scaling our ever-growing logging platform (Loki) to keep up with the business demands.
Maintaining our company-wide code repository and continuous integration solution (gitlab)

What’s it like to work at Cabify?

We’re a company full of happy, motivated people, and we never want that to change. Here are some more reasons why it rocks to be part of our high-performance team:

💶 Excellent Salary conditions: L3 - Up to 52K

🏝️ Recharge days: 10 Fridays OFF annually

🌍Our office is located in Madrid. This position is open to a partially onsite model and also as a full remote based in Spain.

⌚Flexible work environment & hours.

🙌Regular team events.

🚗Cabify staff free rides.

🚀Personal development programs based on our career paths.

🧘‍♀️ iFeel: Free access to the iFeel platform, so you can take care of your emotional well-being through therapy sessions.

📐Coursera: your own license in Coursera to take as many courses as you wish and continue developing your skills.

📐Free access to O'Reilly - The largest technical leading platform for engineers.

💳Flexible compensation plan: Restaurant tickets, transport tickets, healthcare and childcare

💻All the equipment you need (you only have to bring your talent).

Cabify is proud of being an equal opportunity workplace. We celebrate diversity and we are committed to creating an inclusive environment for all employees regardless of background, gender, religion, orientation, age, or ability.

Join us!

Top Skills

Aws Eks

Cortex

Elixir

Gitlab

JavaScript

Kubernetes

Loki

Oltp

Python

Ruby

Tempo

Similar Jobs

Guidewire Software

Site Reliability Engineer

13 Hours Ago

Remote

Senior level

Cloud • Information Technology • Insurance • Software • Analytics

As a Site Reliability Engineer, you'll automate systems for reliability, contribute to core infrastructure, enhance observability, and mentor teams in a collaborative environment.

Top Skills: AuroraAurora PostgresAWSAws SqsBashBitbucketCloudwatchDatadogDockerFlux CdGitGoHelmKafkaKubernetesOauthOracle RdsPagerdutyPythonSAMLTeamcityTerraformTerragruntTerraspace

DuckDuckGo

Senior Site Reliability Engineer

17 Days Ago

Remote

Senior level

Information Technology

As a Senior Site Reliability Engineer, you'll build and maintain infrastructure, tackle operational challenges, and automate processes to enhance reliability.

Top Skills: DockerDocker ComposeGoLinuxPerlPython

Printful

Senior Site Reliability Engineer

13 Hours Ago

Remote

Mid level

Other

Responsible for designing, developing, configuring, and maintaining highly available and distributed solutions, ensuring system reliability across platforms. Collaborate with teams on operational improvements and troubleshoot service issues.

Top Skills: AnsibleAWSCiliumConfluenceDockerElastic StackGitGitGrafanaHelmJenkinsJIRAKubernetesMongoDBMySQLPrometheusRds PostgresqlTerraform

What you need to know about the Dublin Tech Scene

From Bono and Oscar Wilde to today's tech leaders, Dublin has always attracted trailblazers, with more than 70,000 people working in the city's expanding digital sector. Continuing its legacy of drawing pioneers, the city is advancing rapidly. Ireland is now ranked as one of the top tech clusters in the region and the number one destination for digital companies, with the highest hiring intention of any region across all sectors.