Observability Engineer - Prometheus, Grafana - MX

Indeed

Full-time

Onsite

No experience limit

No degree limit

Isabel La Católica 5, Historic Center of Mexico City, Centro, Cuauhtémoc, 06000 Ciudad de México, CDMX, Mexico

Favourites

Some content was automatically translatedView Original

Description

Job Summary: Join us as an Observability Engineer to implement and optimize tools enabling automated, efficient monitoring, ensuring the stability and performance of production cloud infrastructures. Key Highlights: 1. Design and optimize monitoring solutions for cloud infrastructures. 2. Ensure stability and performance of production cloud infrastructures. 3. Collaborate in a Great Place to Work environment and an innovation-driven culture. ### **Overview** Join our Site Reliability Engineering team as an **Observability Engineer**, where we implement and optimize tools enabling automated, efficient monitoring—providing the necessary insights to resolve issues and ensure continuous, correct operation of our cloud-based products in production environments. You will be challenged to guarantee the stability, availability, and performance of production cloud infrastructures by designing and implementing monitoring and performance indicator visualization solutions for platforms—ensuring uninterrupted operation of large-scale data centers supporting our critical, always-on applications and infrastructure. **This role is available for remote work from the following locations: Mexico, Chile, Argentina, Colombia, Uruguay, and Peru.** **Responsibilities** --------------------- * Design, implement, and optimize monitoring solutions for cloud infrastructures. * Define, analyze, and implement dashboards to visualize critical performance indicators. * Ensure proper operation of production clouds based on open-source technologies (e.g., Kubernetes and OpenStack). * Address critical platform incidents, escalating to Senior Engineers or the Product Development team as needed. **Technical Requirements** ----------------------- * Education: + Degree in Computer Engineering, Systems Engineering, Computer Science, or related field. * Experience: + Minimum 3 years of relevant experience in managing, optimizing, and monitoring cloud infrastructures—especially with technologies such as Kubernetes and/or OpenStack—and handling incidents and production environments. + Experience designing and implementing monitoring solutions for cloud infrastructures, as well as performance management and coordination of critical incident resolution with development teams. * Specific Knowledge / Technical Requirements: + Intermediate Linux - Basic commands, file manipulation, networking, etc. - Experience with Shell scripting (Bash). - Automation (scripting) using Bash and/or Python. + Git: Basic level - Familiar with standard workflow: add, commit, push. - Not familiar with more advanced commands such as rebase or cherry-pick. - Unable to resolve merge conflicts. + Intermediate use and creation of container images with Docker. - Ability to create images using a Dockerfile. - Understanding of the Docker container lifecycle. + Use and configuration of monitoring tools (Prometheus, Grafana, Elasticsearch, Kibana). + Use and configuration of deployment tools such as GitLab, ArgoCD, etc. + Knowledge of monitoring external components such as routers, switches, Kubernetes clusters, VMs. + Use and administration of Kubernetes clusters. * Language: Intermediate English (Writing/Reading) * Desirable * Public Cloud (AWS, GCP, Azure) or Private Cloud (OpenStack) experience * Experience with Agile methodologies (Scrum, Kanban, etc.) * Ability to adapt existing open-source solutions * Certifications in Linux, OpenStack and/or Kubernetes * Integration of open-source projects * Basic Networking knowledge * Required Soft Skills + Autonomy, discipline, and self-learning ability + Conceptual analytical thinking + Customer orientation + Teamwork capability #### **About Us** At **Whitestack**, we are leaders in Latin America in developing Telco Cloud, Open Networking, and hyper-scalable digital infrastructure solutions. We work with open-source technologies such as OpenStack, Kubernetes, Open Source MANO, Ceph, Prometheus, ONOS, and many others—and actively collaborate with global organizations including ETSI, the Open Infrastructure Foundation, the Telecom Infra Project, and the Open Compute Project. We drive digital transformation across the region through world-class standards, major operator deployments, and a strong commitment to innovation. Additionally, we are a **Great Place to Work**, where collaboration and personal development are integral parts of our culture. **Why Join Whitestack?** International exposure: Participate in global initiatives and travel to collaborate with teams across different countries. ️ Real work-life balance: We design policies aligned with your lifestyle, empowering you to work autonomously and purposefully. Clear growth path: We offer a robust career track in both leadership and technology. Health first: Private health insurance for you and your family. Unlimited learning: Access to courses, books, materials, and certification reimbursement. Languages for the world: Language courses so your growth knows no borders. Technology in your hands: We renew your equipment every 3 years… and it’s yours at the end of the period! Recognition for effort: Performance and project success bonuses. Time for you: Minimum 15 vacation days, a birthday day off, and additional breaks before Independence Day, Christmas, and New Year. Connection and fun: Budget for recreational and team-building activities. Innovation culture: Your ideas matter. We encourage strategic participation from any role. Learn more about our benefits here.

Source: indeed View original post