Bachelor's Degree in Computer Science or related field., Strong knowledge of application monitoring tools like Dynatrace and Grafana., Experience in incident management and root cause analysis., Ability to perform data extraction and handle ad-hoc requests..
Key responsibilities:
Monitor and support application performance and health post-deployment.
Manage traffic diversion and validate code deployment success.
Diagnose and track critical incidents to resolution, providing root cause analysis.
Document changes and participate in governance meetings while providing holiday support coverage.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
At BCE Global Tech, we are at the forefront of technological innovation, driving global connectivity through cutting-edge advancements in AI, 5G, MEC, IoT, and cloud-native architecture.
Our community of technology developers is shaping the future of the telecommunications industry by developing next-generation products and services for consumers and small, medium and enterprise businesses. These include:
• Ultra-fast internet access
• Immersive virtual reality experiences
• Innovative IoT solutions
• AI-enabled technologies for personalized user experiences and optimized network performance
At BCE Global Tech, you'll be part of a dynamic and collaborative team comprised of world-class technical and business professionals from across Canada, India, Philippines, Morocco, and beyond. This diversity of experience and perspectives fosters a stimulating environment where you can learn from the best and contribute to innovative solutions.
If you are passionate about building a better, connected future, we invite you to join our team and be a part of this exciting journey.
Production patching and monitoring activities for in scope applications (Liveliness Probe, DataGrid, SOSS, POD restarts)
Monitor and action the alert using Bell Monitoring Tools (Dynatrace, BAM, Grafana)
Monitor of DB server to verify through daily sanity check
Verify Table Space / Disk Space status and warn if it’s reaching capacity.
Verify Memory and Processor usage and warn if it’s reaching capacity.
Production Monitoring
Diagnosing and tracking Incidents and problems with Severity Critical (P1) and High (P2) through to Resolution
Providing the required Production Logs or access to Production Logs to analyze the incidents.
Provide the Root Cause Analysis for all Critical Incidents.
Repairing data and associated work caused by invalid data where validation code does not exist or where a -documented Incident caused by a transaction results in failures.
Providing workarounds for Critical and High Incidents
Updating relevant system, configuration or process documentation.
Document and promptly notify Bell of any emergency changes required.
Participate in AMS Operations Governance meetings (assumed to be bi-weekly)
Responding to Application-related questions, performing data extraction as required
Handling ad-hoc requests from end users for information, queries, or reports.
Providing holiday support coverage
Performing peak period monitoring and reporting for specific critical applications
Perform daily health checks for Critical applications.
Preferred Qualifications
Bachelor Degree - Computer Science
What We Offer
Competitive salaries and comprehensive health benefits
Flexible work hours and remote work options.
Professional development and training opportunities.
A supportive and inclusive work environment.
Required profile
Experience
Spoken language(s):
English
Check out the description to know which languages are mandatory.