Offer summary

Qualifications:

Bachelor's Degree in Computer Science or related field., Strong knowledge of application monitoring tools like Dynatrace and Grafana., Experience in incident management and root cause analysis., Ability to perform data extraction and handle ad-hoc requests..

Key responsibilities:

Monitor and support application performance and health post-deployment.

Manage traffic diversion and validate code deployment success.

Diagnose and track critical incidents to resolution, providing root cause analysis.

Document changes and participate in governance meetings while providing holiday support coverage.

Job description

Back

Key Responsibilities

Application monitoring and support

Manage traffic diversion during deployments
Validation of code deployment success
Post deployment health monitoring and reporting
Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts)
Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana)
Monitor of DB server to verify through daily sanity check
Verify Table Space / Disk Space status and warn if it’s reaching capacity.
Verify Memory and Processor usage and warn if it’s reaching capacity.

Production Monitoring

Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution
Providing the required Production Logs or access to Production Logs to analyze the incidents.
Provide the Root Cause Analysis for all Critical Incidents.
Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures.
Providing workarounds for Critical and High Incidents
Updating relevant system, configuration or process documentation.
Document and promptly notify Bell of any emergency changes required.
Participate in AMS Operations Governance meetings (assumed to be bi-weekly)
Responding to Application-related questions, performing data extraction as required
Handling ad-hoc requests from end users for information, queries, or reports.
Providing holiday support coverage
Performing peak period monitoring and reporting for specific critical applications
Perform daily health checks for Critical applications.

Preferred Qualifications