···
Log in / Register
Lead Site Reliability Engineer
Indeed
Full-time
Onsite
No experience limit
No degree limit
Heroico Colegio Militar 323, Reforma, 44890 Guadalajara, Jal., Mexico
Favourites
Share
Description

Summary: Seeking a Lead Site Reliability Engineer to spearhead reliability, scalability, and performance for an AI-powered property intelligence platform, bridging AI model inference and enterprise-grade stability. Highlights: 1. Lead reliability, scalability, and performance for an AI-powered platform 2. Bridge complex AI model inference and enterprise-grade stability 3. Mentor engineers and collaborate with Senior Delivery Directors **Company Description** At KMS Technology, we are dedicated to delivering cutting\-edge solutions and services that empower businesses to achieve their goals. Our team is composed of highly skilled professionals who are passionate about technology and innovation. We provide a dynamic and collaborative work environment where you can grow your career and make a significant impact. **Job Description** We are seeking a **Lead Site Reliability Engineer** to spearhead the reliability, scalability, and performance of our AI\-powered property intelligence platform. Operating at the intersection of Geospatial AI and Insurance Technology, you will be responsible for a mission\-critical **Azure** ecosystem supporting high\-throughput **Java** microservices. As a Lead, you will bridge the gap between complex AI model inference and enterprise\-grade stability. You will own the "Production Excellence" mandate, mentoring a team of engineers and collaborating with Senior Delivery Directors to ensure our global infrastructure stays ahead of our rapid growth. **Key Responsibilities** **Strategic Infrastructure \& Azure Leadership** * **Cloud Architecture:** Lead the design of highly available, multi\-region architectures on **Azure**, utilizing AKS (Azure Kubernetes Service), Azure Functions, and Service Bus. * **IaC Governance:** Establish and enforce standards for Infrastructure as Code using **Terraform** or Bicep, ensuring 100% automated provisioning across all environments. * **Java Performance Engineering:** Partner with Backend squads to optimize **JVM** performance, garbage collection tuning, and memory management for high\-concurrency insurance processing. **Reliability \& AI Operations (AIOps)** * **Error Budgeting:** Define, negotiate, and manage **SLIs, SLOs, and SLAs** with Product Stakeholders, balancing the velocity of AI feature releases with system stability. * **Advanced Observability:** Architect end\-to\-end monitoring and distributed tracing using **Azure Monitor, Application Insights**, and ELK/Grafana. * **Incident Commander:** Act as the ultimate escalation point for high\-priority incidents, leading complex Root Cause Analysis (RCA) and driving long\-term remediation tasks. **Security \& Industry Compliance** * **Data Sovereignty:** Ensure the platform adheres to insurance\-specific data residency requirements and security frameworks (SOC2, HIPAA, or ISO 27001\). * **Automated Governance:** Implement Azure Policy and automated security scanning within CI/CD pipelines to ensure a "Secure by Design" infrastructure. **Qualifications** **Technical Leadership:** * **7\+ years** in SRE, DevOps, or Cloud Engineering, with at least **2 years in a Lead or Principal capacity.** * **Azure Mastery:** Expert\-level knowledge of the Azure Well\-Architected Framework, specifically around networking (VNet/ExpressRoute) and Compute. * **Java Ecosystem:** Deep proficiency in the **Java/Spring Boot** stack from an operational perspective (JVM profiling, thread dump analysis). * **Container Orchestration:** Mastery of **Kubernetes (AKS)**, including ingress controllers, service mesh (Istio), and cluster security. **Professional Competencies:** * **Strategic Mindset:** Ability to translate technical debt and reliability risks into a data\-driven business case for leadership. * **Automation Advocate:** Proven track record of eliminating "Toil" through Python, Go, or Java\-based automation tooling. * **Mentorship:** Passion for leveling up the engineering organization through workshops, documentation, and pair programming. * **AI\-First Integration:** Experience leveraging AI for predictive scaling and automated log summarization to reduce Mean Time to Recovery (MTTR). **Additional Information** ***Perks you enjoy at KMS******Mexico*** * Mexican law benefits * 15 days of PTO (in year zero, from the first year onwards it is 3 days per year). * 5 days' leave for the death of immediate family members, negotiable. * Major Medical Expenses Insurance with coverage for immediate dependents (spouse and children). * Annual performance bonus (10% of annualized salary). * Annual salary adjustment. * Employee Referral Bonus. * Paid Certifications / Courses * Coursera License. * 5% Savings Fund. * 5% Grocery Vouchers.

Source:  indeed View original post
Juan García
Indeed · HR

Company

Indeed
Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.