Categoría del trabajo: Site Reliability Engineer

Tipo de trabajo Remoto - Tiempo completo

Ubicación del trabajo LATAM

We are looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering our secure, mission-critical SaaS platform. Our architecture spans traditional Windows-based .NET/IIS apps and modern cloud-native services using AWS, Docker, Kubernetes and Terraform. You’ll play a key role in ensuring uptime, reliability, and operational excellence across a hybrid stack.
You must be confident in operating and debugging both modern infrastructure (cloud native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.

Key Responsibilities:
● Be part of the team that owns the uptime and performance of our core backend infrastructure (Windows + Linux)
● Maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry.
● Manage CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation.
● Support and maintain production Windows environments:
– .NET Framework/Core apps running in IIS.
– SQL Server with AlwaysOn replication and Service Broker-based messaging.
● Support and operate cloud-native services:
– AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker.
● Participate in on-call rotation and incident response.
● Collaborate closely with engineering teams to improve system reliability and deployment workflows.

What We’re Looking For:
● 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems.
● Strong experience with Windows Server, IIS, and .NET applications in production.
● Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker.
● Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM.
● Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging.
● Experience with Docker, Terraform, and Ansible for infrastructure management.
● Strong scripting skills (PowerShell, Python)
● Experience running and debugging containerized and distributed systems in production.
● Excellent incident response and debugging skills.

Perfecto, seguí hablando con él.

Aplica para esta posición

Si ya estás hablando con un reclutador de CONEXIONHR, NO COMPLETES EL FORMULARIO.