





Join our team as a **Senior Site Reliability Engineer**, where you will maintain and improve our product monitoring system, manage incident responses, and facilitate collaboration between operations and development teams. This role requires deep domain knowledge in Oil \& Gas and expertise in automation and cloud solutions. Apply now to contribute to a critical infrastructure project and ensure operational excellence. **Responsibilities** * Maintain and improve the product monitoring system * Manage incident response including troubleshooting, resolution, documentation, and post\-mortem analysis * Share knowledge and lessons learned across teams * Act as a bridge between operations and development teams * Build automation solutions for log analysis, testing production environments, and alert automation * Monitor system health, performance, and service level indicators (SLI/SLO/SLA) * Document knowledge and procedures related to incident management * Conduct post\-incident reviews and implement improvements * Provide on\-call support during and outside regular working hours * Collaborate with development and operations to improve reliability and efficiency * Use tools like PagerDuty, ELK/Kibana, SEQ logging, Prometheus, and Grafana for monitoring and incident management * Develop and maintain scripts and automation using Python, C\#, and Bash * Manage infrastructure and orchestration with SaltStack and Docker * Support project management and issue tracking using Azure DevOps and Wiki * Maintain source code management using Git **Requirements** * Experience building solutions from scratch with 3\+ years in Site Reliability Engineering * Strong expertise in cloud providers and automation scripting with Bash and Python * Deep domain knowledge of Oil \& Gas industry operations and incident resolution * Proven experience managing incident response and on\-call support * Familiarity with monitoring tools including Prometheus and Grafana * Experience with logging tools such as ELK/Kibana and SEQ logging * Knowledge of infrastructure and orchestration tools like SaltStack and Docker * Basic network knowledge including inbound/outbound and firewall rules * Experience with project management and issue tracking tools like Azure DevOps * Proficient in source code management using Git * Strong documentation and knowledge\-sharing skills * Ability to conduct thorough post\-incident reviews * Excellent troubleshooting and problem\-solving skills * Good communication skills with English proficiency at B2\+ level **Nice to have** * Experience with PagerDuty for incident management * Familiarity with C\# programming * Knowledge of SQL and MongoDB databases * Experience with Zededa infrastructure * Prior involvement in Oil \& Gas field operations support **We offer** * Career plan and real growth opportunities * Unlimited access to LinkedIn learning solutions * Constant training, mentoring, online corporate courses, eLearning and more * English classes with a certified teacher * Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more) * Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more) * Flexible work schedule and dress code * Collaborate in a multicultural environment and share best practices from around the globe * Hired directly by EPAM \& 100% under payroll * Law benefits (IMSS, INFONAVIT, 25% vacation bonus) * Major medical expenses insurance: Life, Major medical expenses with dental \& visual coverage (for the employee and direct family members) * 13 % employee savings fund, capped to the law limit * Grocery coupons * 30 days December bonus * Employee Stock Purchase Plan * 12 vacations days * Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th \& 31st) * Monthly non\-taxable amount for the electricity and internet bills EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi\-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting\-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. *By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.*


