Match score not available

Site Reliability Engineer

extra holidays - extra parental leave
Remote: 
Full Remote
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Computer Science or similar, 5+ years of relevant experience, 3+ years experience with Kubernetes and Docker, 3+ years with Prometheus and Grafana.

Key responsabilities:

  • Monitor production environment availability
  • Create sustainable systems through automation
RunBuggy logo
RunBuggy Scaleup https://runbuggy.com/
51 - 200 Employees
See more RunBuggy offers

Job description

Job Type
Full-time
Description

RunBuggy is a technology platform that connects car shippers and haulers. RunBuggy allows shippers to seamlessly connect with their existing management systems to integrate car transportation services, reducing cost, and improving time to deliver. For transporters, RunBuggy offers an alternative to expensive load boards and custom software solutions to better find and manage transportation loads. Since 2019, RunBuggy has grown to over 125 employees, has helped shippers move hundreds of thousands of cars, and has attracted tens of thousands of transporters across the U.S.


We are currently seeking a Site Reliability Engineer (SRE) to join our engineering team. As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability, scalability, and performance of our company's software systems and applications. You will be responsible for designing, implementing, and maintaining the infrastructure and tools necessary to support our platform, as well as improving our monitoring, automation, and deployment processes. The ideal candidate will have a deep understanding of system architecture, network, and server infrastructure, and be able to collaborate effectively with cross-functional teams.


If this sounds just like you, then please read on! If you feel this is not in your wheelhouse, that is okay too! We are actively hiring outstanding professionals, so we encourage you to apply to one of our many other opportunities.



What You Will Be Doing:

  • Monitoring of the production environment availability and taking a holistic view of system health.
  • Build software and systems to manage platform infrastructure and applications.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve systems and environments.
  • Provide primary operational support and engineering for multiple large software applications.
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault-finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplifts.
  • Create and automate Incident response plans for systems.
  • Develop and create well-defined Alerts that help notify stakeholders of system performance 24/7.
  • Ensure the ability to provide Continuous integration and continuous delivery with supported environments.
  • Balance feature development speed and reliability with well-defined service level objectives.
  • Create well-written incident responses.
  • Create well-documented incident tickets for all issues in the environment.
  • Create Root Cause analysis for environment issues.
  • Be on rotation for Alerts to respond to RunBuggy Availability and provide support for engineering and customers accessing RunBuggy systems.
  • Design, build, and maintain core infrastructure pieces that allow Runbuggy scaling to support thousands of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of Runbuggy's infrastructure.
  • Additional duties as assigned.


Requirements

What You Bring to the Team by Way of Skills and Experience:

  • Bachelor's degree in Computer Science, Information Systems, or similar.
  • 5+ yrs of relevant experience.
  • Min 3 years experience with Kubernetes – deployments, scripts, monitoring.
  • Docker – optimizations.
  • Min 3 years Prometheus (collections, alert management).
  • Min 3 years Grafana.
  • Min 3 Years ELK stack (Elasticsearch, Kibana ..).


Travel Requirements (if any):

  • Telecommute position, but may have occasional travel as needed (<10%).


What is in it for You and Why you Should Apply:

  • Market competitive pay based on education and experience.
  • Highly competitive medical, dental, vision, Life w/ AD&D, Short-Term Disability insurance, Long-Term Disability insurance, pet insurance, identity theft protection, and a 401k retirement savings plan.
  • Employee wellness program. 
  • Employee rewards, discounts, and recognition programs.
  • Generous company-paid holidays (12 per year), vacation, and sick time.
  • Paid paternity/maternity leave.
  • Remote work environment with monthly connectivity/home office stipend. 
  • A supportive and positive space for you to grow and expand your career.


To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.


RunBuggy is an equal-opportunity employer that is committed to diversity and inclusion in the workplace. We prohibit discrimination, harassment, and retaliation on the basis of race, color, religion, sex (including gender identity and sexual orientation), pregnancy, parental status, national origin, age, disability, genetic information, or any other status protected under federal, state, or local law.


Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Thinking
  • Collaboration
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs