- Enhancing and maintaining the observability stack with a focus on usability, scalability, and capacity planning.
- Working with teams to improve the quality, consistency, and value of metrics, logs, and traces.
- Developing tools and processes to simplify the integration of observability into developer workflows.
- Building meaningful, actionable dashboards and alerts tailored to teams’ needs.
- Collaborating with developers to define SLIs and SLOs that align with business goals.
- Investigating and resolving gaps or bottlenecks in the observability pipeline.
- Advocating for and driving the adoption of observability best practices across the organization.
- Staying up to date on emerging trends in observability and recommending improvements.
You are the right future Veriffian for the job if you have:
- Strong expertise in observability tools, particularly the LGTM stack (Loki, Grafana, Tempo, Mimir) or similar.
- Deep understanding of Kubernetes and its observability ecosystem.
- Programming skills in Python, Go, or similar languages for building observability tools.
- Strong scripting skills (e.g., Bash) for automation tasks.
- Experience improving metrics, logs, and traces pipelines for better data quality and reliability.
- Familiarity with distributed systems, microservices, and CI/CD pipeline integration.
- Experience with observability tools like Prometheus, OpenTelemetry, or Jaeger.
- Knowledge of reusable libraries or templates for observability integration.
- Strong collaboration skills, with the ability to work with developers and stakeholders to improve their experience with observability.