Offer summary

Qualifications:

Proficiency in public cloud platforms (Azure, AWS, or GCP)., Experience with CI/CD, deployment automation, and GitOps practices., Familiarity with observability tools like Prometheus, Grafana, and Datadog., Knowledge of infrastructure as code using Terraform or similar tools..

Key responsibilities:

Act as a guardian of service reliability in production, aiming to reduce MTTR and increase availability.

Define and monitor SLOs, SLIs, and SLAs in collaboration with business and development areas.

Develop automations for failure mitigation, deployments, and incident recovery.

Conduct root cause analyses and document incident retrospectives.

Job description

JOB DESCRIPTION

RESPONSIBILITIES AND ASSIGNMENTS

Atuar como guardião da confiabilidade dos serviços em produção, buscando reduzir o MTTR (Mean Time to Recovery) e aumentar a disponibilidade;
Definir e monitorar SLOs, SLIs e SLAs junto às áreas de negócio e desenvolvimento;
Desenvolver automações para mitigação de falhas, deploys, testes de resiliência e recuperação de incidentes;
Promover práticas de observabilidade com dashboards, alertas e tracing distribuído;
Participar ativamente do ciclo de vida dos sistemas, desde a concepção até a sustentação em produção;
Conduzir e documentar análises de causa raiz (RCA) e retrospectivas de incidentes;
Trabalhar com infraestrutura como código para provisionamento seguro e escalável;
Influenciar cultura de engenharia focada em confiabilidade, performance e operações sustentáveis;
Garantir a implementação de ferramentas de monitoração nos ambientes.

REQUIREMENTS AND QUALIFICATIONS

Proficiência em cloud pública (Azure, AWS ou GCP);
Conhecimento em CI/CD, automação de deploys e práticas de GitOps;
Experiência com observabilidade: Prometheus, Grafana, Loki, Elastic Stack, Datadog, New Relic ou similares;
Gerenciamento e orquestração de containers (Kubernetes, Helm, Istio/Linkerd);
Experiência com infraestrutura como código (Terraform, Pulumi ou similares);
Habilidade com linguagens de scripting (Bash, Python ou Go);
Conhecimento em redes, DNS, balanceadores de carga, TLS, failover e escalabilidade;
Capacidade de liderar análises de incidentes e ações preventivas (blameless postmortems);
Diferenciais: Certificações relevantes (CKA, AZ-400, AWS DevOps Pro, GCP SRE, experiência com Chaos Engineering e testes de falha (Gremlin, LitmusChaos, etc.), vivência em ambientes com alto volume de requisições e arquitetura distribuída (microserviços, serverless) e familiaridade com práticas de FinOps e otimização de custos em nuvem.

Não possui todos os requisitos para a vaga?

Está tudo bem! Na Compass UOL, estimulamos o desenvolvimento contínuo de novos talentos e transformamos desafios em oportunidades.

ADDITIONAL INFORMATION

#remote

"remote"

DREAM BIG WHEN IT COMES TO TECHNOLOGY. BE A COMPASSER! 🚀

Compass UOL is a global company that is part of AI/R, which drives the transformation of organizations through Artificial Intelligence, Generative AI, and Digital Technologies.

We design and build digitally native platforms using cutting-edge technologies to help companies innovate, transform businesses, and drive success in their markets. With a focus on attracting and developing the best talent, we create opportunities that improve lives and highlight the positive impact of disruptive technologies on society.

That's why our selection process goes beyond technical skills. Our goal is to find unique individuals with the potential to make an extraordinary impact on our clients.

We empower talent without borders and promote knowledge and opportunities in the latest market trends, driving significant results.

Join us and be part of the AI-driven digital revolution in the technology universe.

HOW OUR SELECTION PROCESS WORKS

1. ONLINE APPLICATION

Choose the opportunity that best fits your goals. Remember: having a well-detailed profile with your experiences and knowledge can make all the difference!

2. INTERVIEWS

Learn about our culture and company! During interviews, be present and do your best to share your expertise in a chronological and structured way.

3. EVALUATION

Our tests and assessments focus on finding talent with the cultural and technical fit for the position applied for.

4. FEEDBACK

Wait for our response regardless of the result! We have Gupy platform feedback certification.

Required profile

Are you interested?

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineer (SRE)

9 days ago

Claroty

Full time

Incident ResponseSystem Level TroubleshootingDevOps

Site Reliability Engineer (Remote - Canada)

26 days ago

Confluent

Full time

Java (Programming Language)Distributed ComputingGo (Programming Language)Python (Programming Language)

Site Reliability Engineer

30+ days ago

Platform.sh

Full time

Cloud ComputingLinuxDevOpsSite Reliability Engineering

Software Engineer, Site Reliability (Senior or Staff)

30+ days ago

BioRender

Full time

Automated Information SystemsCloud ApplicationsObservabilitySite Reliability Engineering

Senior Site Reliability Engineer

30+ days ago

Rootly

Site Reliability EngineeringDistributed Computing

See more Site Reliability Engineer (SRE) jobs

Site Reliability Engineering | Specialist

Offer summary

Qualifications:

Key responsibilities:

Job description

JOB DESCRIPTION

RESPONSIBILITIES AND ASSIGNMENTS

REQUIREMENTS AND QUALIFICATIONS

ADDITIONAL INFORMATION

DREAM BIG WHEN IT COMES TO TECHNOLOGY. BE A COMPASSER! 🚀

Join us and be part of the AI-driven digital revolution in the technology universe.

HOW OUR SELECTION PROCESS WORKS

1. ONLINE APPLICATION

Choose the opportunity that best fits your goals. Remember: having a well-detailed profile with your experiences and knowledge can make all the difference!

2. INTERVIEWS

Learn about our culture and company! During interviews, be present and do your best to share your expertise in a chronological and structured way.

3. EVALUATION

Our tests and assessments focus on finding talent with the cultural and technical fit for the position applied for.

4. FEEDBACK

Required profile

Experience

Hard Skills

Other Skills

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineer (SRE)

Site Reliability Engineer (Remote - Canada)