Match score not available

Remote Opportunity as a Windows Site Reliability Engineer

Remote: 
Full Remote
Work from: 

Offer summary

Qualifications:

Expertise in Microsoft Windows Server technologies, Strong programming skills in PowerShell, Python, and C#, Experience with Active Directory, DHCP, DNS systems, Knowledge of Infrastructure as Code principles.

Key responsabilities:

  • Ensure infrastructure reliability and efficiency
  • Develop monitoring solutions and perform RCA on incidents
emagine logo
emagine Large https://www.emagine.org/
501 - 1000 Employees
See all jobs

Job description

Job Title: Windows Site Reliability Engineer

Location: India (Remote)

Duration: 4-6 Months

Project Type: Contract

Job Description:

  • Responsible for the reliability and efficiency of infrastructure through the delivery of common, repeatable tools and processes that greatly reduce the amount of toil operations must perform
  • Member of L3 Engineering team providing subject matter expertise and ultimate escalation
  • Develop software to make infrastructure services self-managing and self-service
  • Deliver continuous service improvement by developing Infrastructure as Code
  • Eliminate manual, repetitive, automatable, tactical tasks that are devoid from value
  • Improve system performance, make effective use of resources, distribute load and reduce latency
  • Identify SLO’s (Service Level Objectives) to meet availability and latency objectives
  • Develop pro-active monitoring solutions that alert on symptoms and not just on outages
  • Perform detailed root cause analysis (RCA’s) on incidents and outages to prevent future
  • Partner with development teams to improve services via rigorous testing and release procedures
  • Identify technical debt and partner with application teams to build remediation plans
  • Develop standard operational procedures and produce effective documentation
  • Analyse workloads and devise suitable cloud migration strategies where appropriate
  • Ensure all project / investment workloads are delivered according to plans and budget defined
  • Liaise with Infrastructure Control and IT Risk teams to satisfy internal and external audit requests
  • Deputise for team lead when required to do so and act-up accordingly
  • Identify cost saving and optimisation opportunities across the group
  • Build strong working relationships across the organisation
  • Adhere to the core values of the bank

Secondary:

  • Perform daily health and compliance checks for all systems as required
  • Ensure all systems are backed up successfully and any issues are promptly resolved
  • Validate monitoring alerts and batch job failures are detected promptly and satisfactorily resolved
  • Ensure sufficient capacity is available to accommodate drive growth
  • Respond to emails sent to the team distribution list / mailboxes in a timely manner
  • Handle incidents and requests with efficiency and a “customer first” mindset
  • Maintain infrastructure in a highly available, reliable, secure and performant manner
  • General Server / Database / Virtualisation Administration maintenance activities
  • Provide technical support to application support and development teams
  • Provide consultancy to application support and development teams

Take part in On-Call & weekend work rotation; triaging and addressing production issues as they arise

  • Exceptional skills in Microsoft Windows Server internals and related technologies
  • Excellent skills in managing and maintaining Active Directory, DHCP, DNS, LDAP and Kerberos
  • Extensive experience in hardware performance monitoring and tuning complex low latency systems.
  • Agile, Site Reliability Engineering (SRE) and DevOps Principles and practices
  • Exceptional knowledge of scripting and programming languages such as PowerShell, Python and C#
  • Fluent in Backup and Recovery processes and procedures
  • Advanced knowledge of Clustering, High-Availability, Replication and Disaster Recovery techniques
  • Ability to tune Network, Storage, Server and Virtualisation layers for optimal performance and reliability
  • Excellent Performance Tuning skills, in-depth knowledge of system internals, performance counters and performance measurement and analysis tools.
  • Ability to interpret and implement CIS security hardening recommendations in a controlled manner
  • Acute awareness of Security and Auditing requirements in a regulated environment
  • “Infrastructure as Code” Principles and practices.
  • “Continuous Integration (CI) and Continuous Development (CD)” Principles and practices
  • Git, Ansible, Terraform and TeamCity
  • Serena Deployment Automation (SDA) and Jenkins

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Site Reliability Engineer (SRE) Related jobs