Jobs

Senior Site Reliability Engineer at Moniepoint Inc.

  • Job Type Full Time , Remote
  • Qualification BA/BSc/HND
  • Experience 4 years
  • Location Lagos
  • Job Field ICT / Computer&nbsp

Senior Site Reliability Engineer at Moniepoint Inc.

Senior Site Reliability Engineer

Job Summary

  • We are seeking an experienced Site Reliability Engineer (SRE) responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience.
  • The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.

Responsibilities

  • Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments.
  • Act as the Incident Commander during major incidents: initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses (RCAs) to identify the root causes of issues and drive long-term fixes.
  • Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience.
  • Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health.
  • Participate in feature development discussions to ensure services are built with observability from the ground up.
  • Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with Product and Engineering teams.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.

Requirements

  • Minimum of 4 years of experience supporting enterprise applications in an SRE or similar role.
  • Knowledge of distributed systems, microservices architecture and software design patterns.
  • Experience with cloud platforms such as AWS, GCP, or Azure.
  • Strong knowledge of Kubernetes and container orchestration tools.
  • Experience using application performance monitoring tools, OpenTelemetry, and observability platforms such as New Relic, Datadog, ELK, or SigNoz
  • Excellent problem-solving and troubleshooting skills as an on-call engineer, with the ability to resolve complex infrastructure and application issues.
  • Proficient in setting up and maintaining monitoring dashboards and alerts using Grafana and Prometheus.

Method of Application

Interested and qualified? Go to Moniepoint Inc. on job-boards.eu.greenhouse.io to apply

Leave a Comment