Match score not available

Site Reliability Engineer

extra holidays - extra parental leave - work from home - work from anywhere
Remote: 
Full Remote
Contract: 

Offer summary

Qualifications:

Solid understanding of DevOps, Cloud Operations, or SRE principles with a focus on reliability and scalability., Hands-on experience with Linux systems, including performance tuning and troubleshooting., Proficiency in programming languages such as Go or Python, with strong scripting skills in languages like Bash., Extensive experience with cloud platforms like AWS, GCP, and Azure, along with knowledge of CI/CD pipelines..

Key responsabilities:

  • Enhance system monitoring using tools like Prometheus and Grafana to ensure visibility and alignment with business objectives.
  • Automate deployments and workflows using IaC tools like Terraform and Ansible to improve operational efficiency.
  • Support incident management and lead post-mortem analysis for continuous improvement.
  • Collaborate with engineering and product teams to integrate reliability practices into the development lifecycle.

Platform.sh logo
Platform.sh Internet Scaleup https://linkstre.am/platform.sh
201 - 500 Employees
See all jobs

Job description

About Platform.sh

Platform.sh is Platform-as-a-Service (PaaS) that removes the complexities of cloud infrastructure management and optimizes development-to-production workflows, reducing the time it takes to build and deploy applications. Delivering efficiency, reliability, and security, giving development teams both control and peace of mind. Built for developers, by developers.

Adopted and loved by 16,000+ developers, 7,000 customers, and for nearly a decade Platform.sh has been providing innovative capabilities that serve as the launchpad for creative development teams’ out-of-the-box thinking.

We provide 24x7 support, managed cloud infrastructure, and automated security and compliance with an all-in-one PaaS. We give our customers complete control over their data by keeping applications secure and available around the clock.

Platformers are a remote, global workforce, and we thrive in a multicultural team. We are committed to open source and an open, welcoming environment. Our team spans the globe and the experience spectrum. What's our commonality, our cultural fabric? A curious spirit and a thirst for knowledge; an eagerness for innovative ideas and cultures. We believe we can build anything together in an environment that frees you to do your best work.

Bring your expertise and enthusiasm to our growing, global organization. Your contributions, collaboration, and unique point of view are recognized and valued here.

Impact of a Site Reliability Engineer

As a Site Reliability Engineer, you are a key part of our team’s transition to the Site Reliability Engineering (SRE) model, moving from traditional Cloud Operations to an automation-driven approach. This shift enhances system reliability, scalability, and efficiency, positioning SRE as a core function within the company.

Moreover, in this role, you focus on improving infrastructure, automating operational tasks, and streamlining processes. You work closely with developers, engineers, and product teams to ensure reliability is embedded throughout the application lifecycle.

As part of this transition, you also help optimize cloud-based systems, reduce manual work, and drive continuous improvements, playing a vital role in the organization’s overall success and long-term stability.

What to expect
  • Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
  • Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
  • Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
  • Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
  • Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
  • Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
  • Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.
What you bring
  • DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
  • Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
  • Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
  • Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
  • Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
  • Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
  • Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.
Where we hire

At Platform.sh, remote work isn't just a trend - it's our foundation. The freedom of remote work with the support of a diverse, global team has been our successful model for nearly a decade. Our culture celebrates flexibility and collaboration, and while we have team members in over 30 countries around the globe, we are currently focused on hiring for this role in France, Germany, Spain, the United Kingdom, and the west coast in the United States or Canada. Although we’re unable to provide visa sponsorship at this time, we welcome applications from all qualified candidates who are legally authorized to work in these countries. 

How we hire

We know that a great hire won’t meet every requirement that we’ve outlined. If you can see yourself elevating the team, we want to hear your story. Few of us would be here had we not taken a chance.

You can expect 4 interviews on Google Meet to follow the order below. Should you successfully move through the entire process you will have the opportunity to meet with a variety of Platformers. Our goal is to ensure you can make the most informed decision on whether this role, and our culture aligns with what you’re looking for in your future working environment. 

  1. 45 Minutes with Talent Acquisition 
  2. 60 Minutes with Hiring Manager (Director, Site Reliability Engineering)
  3. 60 Minutes with Team (Site Reliability Engineer, Director, Site Reliability Engineering)
  4. 60 Minutes with Executive (Senior Director, Site Reliability Engineering)

All roles require background checks.

What we offer

💡 A product you can believe in - Join us in transforming how businesses build and manage web applications, driven making a positive impact as a proud B Corp.

🏆 An Award-Winning Workplace - We’ve been recognized by Forbes’ Top 30 Companies for Remote Jobs and France’s Best Workplaces for Women.

🗣️ A culture that values your voice - Join a flexible, open, and inclusive work environment where your voice is encouraged, and your ideas shape our growth and evolution.

🌎 A global team - Collaborate with colleagues from diverse backgrounds across the world, embracing different perspectives

🎉 Benefits and perks - Make the most of what matters to you

🏝 Flexible PTO

📈 Company stock options

🧠 Professional development budget

💻 Office equipment budget

💆‍♀️ Wellness budget

🧳 Annual team gatherings

🛜 Internet reimbursement

👶 Inclusive parental leave

✈️ Remote work travel program

You belong here

At Platform.sh, we celebrate diversity in all its forms and are committed to fostering an inclusive, equitable, and supportive workplace where everyone can thrive. We embrace and value different perspectives, backgrounds, and experiences, because they make us stronger as a team. Whoever you are, wherever you're from, and whatever path you've taken, you are welcome here. We encourage you to bring your whole self to work, connect with others, and share your passion.

If you need accommodations at any stage of our hiring process, please let us know. We're here to ensure an accessible and comfortable experience for you.

Required profile

Experience

Industry :
Internet
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs