Site Reliability Engineer II

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor’s Degree in Computer Science, Math, MIS or related field., Minimum four years of experience in 24x7 operations as a Linux engineer., Expert knowledge of Linux operating systems and cloud architectures., Strong problem-solving skills and ability to communicate technical insights effectively..

Key responsibilities:

  • Troubleshoot and resolve operational challenges to meet defined service level objectives.
  • Scale systems through automation and improve documentation for applications and runbooks.
  • Collaborate with business and technical teams to provide technical support and guidance.
  • Drive innovation by proposing new ideas for processes, tools, and technologies.

Allegiant logo
Allegiant Large https://www.allegiantair.com/
1001 - 5000 Employees
See all jobs

Job description

Summary
As an SRE II at Allegiant, you play a vital role in the design, maintenance, and performance of all production on-premises and cloud-based systems. You will participate in Allegiant’s cloud transformation efforts to ensure our systems and applications remain highly resilient while adding cloud native functionality to achieve long term scalability and stability. As an SRE II, you will be a champion of innovation by taking the initiative to see problems turned into solutions. It is just as easy for you to work with a team on a large project as it is to simply focus on a goal independently. You are systematic in problem solving, levelheaded under pressure, never satisfied with the status quo, and thrive on creative solutions to problems. You demonstrate a willingness to grow in the technical needs while effectively communicating in either written or verbal form the technical insights into how to refine and improve the system, ultimately ensuring performance, stability, and an exceptional end user experience.

Visa Sponsorship Available
No

Minimum Requirements
Combination of Education and Experience will be considered. Must be authorized to work in the US as defined by the Immigration Act of 1986. Must pass a Criminal Background Check.
Education:  Bachelor’s Degree in Computer Science, Math, MIS or other related field
Certification: Associate level cloud certification from AWS, GCP, Azure or equivalent
Years of Experience:  
•Minimum four (4) years of experience in 24x7 operations requiring rapid response as a seasoned Linux engineer.
•Minimum two (2) years of experience in Cloud Engineering, Systems Engineering, or Site Reliability Engineering experience in an enterprise Linux environment.

•Professionalism, ambition, flexibility, technical expertise, and open communication are paramount characteristics of the team and its team members.
•Possess a desire to develop creative ways to solve business problems through analysis and technology and has a hands-on mentality to problem solving.
•Possess expert knowledge of the Linux operating system and the full capabilities of the Shell.
•Possess an urge for delivering quickly to support mission critical systems and business goals.
•Possess expert knowledge in performance, scalability, enterprise system architecture, cloud architectures, and engineering best practices.
•Ability to work with a variety of teams and technologies and dive into complex problems to diagnose, isolate, and resolve issues as they arise.
•Proactively work on efficiency and capacity planning with an eye to set clear requirements and reduce the system resources consumption rate.
•Identify parts of the system that do not scale and drive long term design changes, guiding junior SREs with the how’s and why’s.
•Measure the risk of introduced features to plan ahead and improve the infrastructure.
•Propose and drive architectural changes that affect the whole company to solve scaling and performance problems.
•Strong foundation in security compliance and adhere to best practices in remediating vulnerabilities, analyzing system scans, and implementing patching automation with zero downtime.
•Track record of building and maintaining excellent working relationships with peers across organizations (QA, Development, PM, Operations, Leadership, etc.).
•Expert experience with the urgency of 24x7, customer-impacting operations, and supporting systems under pressure.
•Independent thinker with creative, resourceful, and proactive problem-solving skills working with a close-knit team that offers full ownership of projects in a supportive design environment.
•Ability to create simulation drills and make recommendations for systems and team response.
•Experience deploying, configuring, administering, and tuning Linux in an enterprise environment.
•Experience supporting the following is required: Linux with applications such as Jboss server, Spring Boot, Tomcat, and Apache.
•Extensive knowledge of Python and applying it across disciplines of system administration and system automation.
•Extensive knowledge of Ansible for system automation.
•Extensive knowledge of infrastructure-related services such as DNS (BIND and Route53), Postfix, SSSD, and LVM across cloud and on-prem systems, including Power architectures.
•Experienced leveraging the capabilities of virtualization platforms, cloud and on-prem, to meet project goals and mitigate impact to users.
•Strong understanding of a broad cross-section of technologies including Databases, Networking, Firewalls, Load Balancing, and Security, their interaction, and impact on the systems.
•Experience supporting fault-tolerant message queuing/brokering systems (e.g. AMQ, RabbitMQ, zeromq, Kafka).
•Extensive experience with designing, configuring, implementing AWS or other cloud services and architecture required; experience transitioning on-prem applications to the cloud preferred.
•Experience/knowledge with service discovery solutions like Consul/Eureka/Zookeeper.
•Experience with building packages for RPM-based systems and architecting custom repositories as needed.
•Experience with backup and monitoring solutions implementations in a cloud environment.
•Experience with developing Infrastructure as Code and implementing immutable systems.
•Experience with monitoring tools (SumoLogic, Splunk, Logstash, ELK, DataDog, Prometheus, etc.).
•Strong and demonstrable experience working in teams with a heavy emphasis on Operations, Automation, Quality, and Performance.
•Excellent analytical thinking, problem-solving, communication, organization, and interpersonal skill.
•Ability to simplify complex problems, processes, or projects into component parts explore and evaluate them systematically.
•Strong written and verbal communication skills, proven presentation skills to all levels of the audience including technologists, management, and executives.
•Ability to lead a team of technologically oriented employees.
•Ability to communicate directly with senior management and subject matter experts through presentations and discussion.
•Ability to maintain regular and predictable attendance subject to the leave, PTO, and attendance policies.

Preferred Requirements
•Airline and/or hospitality experience.
•E-Commerce experience in a large-scale Linux enterprise environment.
•Certification in one (1) of the following (or equivalent):
•Red Hat Certified Architect (RHCA)
•Red Hat Certified Specialist in Linux Diagnostics and Troubleshooting
•Red Hat Certified Specialist in Linux Performance Tuning
•Red Hat Certified Specialist in Advanced Automation: Ansible Best Practices
•Certified Kubernetes Administrator
•Architect cloud certification from AWS or GCP

Job Duties
•Troubleshoot, evaluate, and resolve operational challenges contributing to defined SLO's.
•Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.
•Scale systems through automation, improving change velocity and reliability.
•Lead the organization in identifying trends, drawing conclusions from problems we face, and establishing the actions needed to resolve these issues.
•Propose ideas and solutions within the infrastructure team to reduce the workload by automation.
•Plan, design and execute solutions within the infrastructure team to reach specific goals agreed upon within the team.
•Improve documentation all around, either in application documentation or in runbooks, explaining the 'why' but not stopping with the 'what'.
•Actively look for opportunities to improve the availability and performance of the system by applying the data-driven analysis from monitoring and observation.
•Provide emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed.
•Effectively collaborate with both business and technical teams.
•Continuously expand knowledge of three or more areas of technical knowledge.
•Functionally translate complex problems into simple, straightforward solutions, and guide their implementation with the team.
•Provide solution architecture for business problems while balancing essential technical guidelines.
•Drive innovation by contributing new ideas for our processes, tools, and technologies.
•Exert technical influence over your team, increasing their productivity and effectiveness by sharing your deep knowledge and experience.
•Keep up with the latest developments in the DevOps, Operations, Linux, Automation, SRE, and Cloud community.
•Conduct design and code reviews and contribute, adhere to, and enforce standards and best practices.
•Assist in the career development of others, actively mentoring individuals on advanced technical issues and helping managers guide the career growth of their team members.
•Develop prototypes or demos for any strategic initiative.
•Work with other teams such as QA, PMO, Infosec, DBA, Networking, and Development and provide technical support and guidance to ensure successful delivery of a project.
•Resolve system performance and scalability issues by identifying bottlenecks, resource utilization, and key areas of improvement.
•Stay abreast of new technologies and methods in Linux, Cloud, and other Software Technologies (conferences, meetups, etc.).
•Model Allegiant’s customer service standards in personal actions and when providing leadership direction.
•Other duties as assigned.

Physical Requirements
The Physical Demands and Work Environment described here are a representative of those that must be met by a Team Member to successfully perform the essential functions of the role. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions of the role.

Office - While performing the duties of this job, the Team Member is regularly required to stand, sit, talk, hear, see, reach, stoop, kneel, and use hands and fingers to operate a computer, key board, printer, and phone. May be required to lift, push, pull, or carry up to 20 lbs. May be required to work various shifts/days in a 24-hour situation. Regular attendance is a requirement of the role. Exposure to moderate noise (i.e. business office with computers, phones, printers, and foot traffic), temperature and light fluctuations. Ability to work in a confined area as well as the ability to sit at a computer terminal for an extended period of time. Some travel may be a requirement of the role.

Essential Services Provider
Allegiant as a national air carrier is deemed an essential service provider during declared national and state emergencies. Team Members will be required to report to their assigned trip or work location during national and state emergencies unless prohibited by local, state or federal order.

EEO Statement
We welcome all individuals from varied backgrounds and experiences to apply. Our company celebrates diversity, and we value the unique perspectives and talents that each person brings to our team.

Equal Opportunity Employer: Disability/Veteran
For more information, see https://allegiantair.jobs
Full Time Benefits:
Profit Sharing
Medical/Dental/Vision/Life/ Disability Insurance
Medical Travel Reimbursement
Legal, Identity and Pet Insurance
401K with an employer match
Employee Stock Purchase Plan
Employee Assistance Program
Tuition Reimbursement
Flight Benefits
Paid vacation, holidays, and sick time
 
Part Time Benefits:
Profit Sharing
Medical Travel Reimbursement
Legal, Identity and Pet Insurance
401K with an employer match
Employee Stock Purchase Plan
Employee Assistance Program
Tuition Reimbursement
Flight Benefits
Sick time

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Security Policies
  • Communication
  • Teamwork
  • Analytical Thinking
  • Physical Flexibility
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs