CONEXIONHR

ID 4220 – Site Reliability Engineer (Observability)

Categoría del trabajo: SRE
Tipo de trabajo Remoto - Tiempo completo
Ubicación del trabajo Argentina

The Site Reliability Engineer (SRE) will be responsible for implementing and managing observability across the company’s hybrid application and data landscape. This role ensures system reliability, scalability, and proactive monitoring to reduce downtime and improve MTTR.

Key Responsibilities:
● Implement telemetry (logs, metrics, traces, events) for applications and data systems across on-prem and cloud.
● Design and configure dashboards, alerts, and monitoring tools aligned to SLAs and SLOs.
● Collaborate with Tier 1 and Tier 2 support teams to streamline case and incident management.
● Analyze observability data to identify anomalies, bottlenecks, and root causes.
● Provide proactive monitoring and support AIOps-based predictive insights.
● Document runbooks, workflows, and provide knowledge transfer to client teams.

Required Skills:
● Experience with observability tools (e.g., Prometheus, Grafana, ELK, AppDynamics, Dynatrace).
● Strong understanding of distributed systems, cloud environments (Azure, AWS, GCP), and containers/Kubernetes.
● Knowledge of SRE principles, SLIs, SLOs, and error budgets.
● Hands-on experience with CI/CD pipelines and integration with ITSM tools (ServiceNow, ADO, Jira).
● Strong troubleshooting and root cause analysis skills.
● Excellent communication and collaboration abilities.

Benefits:
● Family health plan.
● Birthday day off.
● Continuous training through content platforms.
And more!

Perfecto, seguí hablando con él.

Aplica para esta posición

Si ya estás hablando con un reclutador de CONEXIONHR, NO COMPLETES EL FORMULARIO.

es_ES
💬 ¿Necesitas ayuda?