




**Position Title: Site Reliability Engineer (SRE) / Senior DevOps Engineer** **Role Purpose** Ensure the stability, reliability, and observability of the critical platform in Mexico by implementing high-availability engineering practices. This role is the cornerstone of operational continuity, managing large-scale deployments and ensuring a scalable infrastructure across hybrid cloud environments. **Job Responsibilities** * **Infrastructure Management:** Administer and optimize **Kubernetes** clusters as the core of operations, ensuring scalability and resilience. * **Automation and CI/CD:** Design, operate, and maintain **CI/CD pipelines in Azure DevOps (YAML)**, ensuring traceable and secure delivery workflows. * **GitOps Operations:** Implement and manage workflows using **Git and ArgoCD** to maintain environment parity. * **Operational Continuity:** Lead production deployments, rollback strategies, and critical incident management, including **Post\-Mortem** analysis. * **Observability:** Configure and manage the monitoring stack in **Datadog**, transforming metrics and logs into actionable alerts that reduce operational noise. * **Technical Collaboration:** Serve as the technical liaison with Development and Regional Architecture teams to optimize application performance. **Required Skills** **Hard Skills (Technology Stack):** * **Orchestration:** Advanced proficiency in **Kubernetes** (AKS/Self\-managed). * **Cloud:** Solid experience with **Azure** (IFX knowledge preferred). * **Automation:** Expert-level experience with **Azure DevOps Pipelines** (YAML) and containers (**Docker**). * **Infrastructure as Code \& GitOps:** Professional expertise with **ArgoCD** and version control tools (Git). * **Observability:** Advanced configuration of dashboards, monitoring, and APM in **Datadog**. **Soft Skills:** * **Troubleshooting:** Strong ability to resolve complex problems under pressure. * **Autonomy:** Ability to execute and propose evidence-based technical improvements without constant supervision. * **Continuous Improvement Mindset:** Focus on automation to eliminate manual tasks (Toil). **Education and Experience** * **Education:** Bachelor’s degree or Engineering in Systems, Computing, Computer Science, or related fields. * **Experience:** Minimum **4\-5 years** in DevOps, SRE, or Cloud Platform Administration roles within highly critical environments. * **Certifications (Preferred):** CKA (Certified Kubernetes Administrator), Azure Solutions Architect, or certifications related to Datadog/SRE. Employment Type: Full-time Salary: $40,000\.00 \- $50,000\.00 per month Benefits: * Maternity leave exceeding statutory requirements * Paternity leave exceeding statutory requirements * Medical expense insurance * Major medical expense insurance * Life insurance * Remote work * Grocery vouchers Work Location: Hybrid remote in 06500, Cuauhtémoc, CDMX


