Sr Site Reliability Engineer

Description

Come and impact millions of Brazilians!!

Want to make a difference in the lives of millions of Brazilians? At RecargaPay, we create accessible and innovative financial solutions that transform the way people interact with money. Be part of this impactful and innovative journey, connecting people with opportunities that truly make a difference in their daily lives.

Our purpose is to deliver the best mobile payment experience for Brazilians, addressing real-world challenges with smart solutions like Pix Parcelado, while staying attentive to market trends and our customers' needs. Here, we value collaboration, ownership, and a relentless pursuit of results, delivering excellence in every interaction.

If you’re looking to join a dynamic environment that challenges the status quo and puts people at the center of decision-making, RecargaPay is the perfect place for you to grow, co-create, and make a difference!

Responsibilities

We are looking for a Senior Site Reliability Engineer (SRE) to define and implement monitoring and observability standards, ensuring the reliability and efficiency of our environment. This professional will be responsible for analyzing metrics and alerts, anticipating failures, identifying infrastructure and application bottlenecks, and proposing architectural improvements to enhance efficiency and availability. They will also play a key role in post-mortems, sharing knowledge and contributing to effective action plans.

Define and enhance monitoring and observability standards;
Support the definition and monitoring of SLIs/SLOs and other key performance indicators to ensure alignment with reliability goals;
Analyze metrics and alerts to anticipate failures and optimize performance;
Identify bottlenecks and areas for improvement in infrastructure and applications;
Propose and implement software architecture and infrastructure improvements to increase efficiency and availability;
Lead and support post-mortems, promoting best practices and lessons learned;
Document best practices, incident learnings, and technical solutions to foster knowledge sharing and accelerate problem resolution;
Work in a GitOps environment, using GitHub Actions for automation;
Collaborate with development and infrastructure teams to ensure service resilience and scalability;
Conduct troubleshooting and performance optimization in containers and Kubernetes (EKS);
Serve as a technical reference for reliability, supporting the adoption of SRE practices across squads and contributing to the evolution of engineering culture;
Work alongside Security, Platform, and Data teams to ensure a holistic approach to reliability and scalability;
Demonstrate the ability to influence technical decisions and drive improvements, even in teams where they are not directly involved;
Maintain a mindset focused on continuous learning, resilience in handling incidents, and a strong emphasis on prevention and automation.

Requirements

Experience with monitoring and observability tools, including New Relic, Prometheus, and Grafana;
Proficiency in GitHub and GitOps practices with GitHub Actions;
Strong experience with AWS and infrastructure as code using Terraform and Terragrunt;
Experience with microservices architecture and Kubernetes;
Solid knowledge in SRE, Resilience, Performance, and Automation;
Hands-on experience with troubleshooting and performance tuning in complex environments;
Expertise in infrastructure and problem analysis in containers and Kubernetes (EKS);
Knowledge of languages such as Python, Ansible, and Shell Script (preferred);
Experience with distributed environments, high availability, and scalability;
Familiarity with post-mortems and incident response.

Nice to Have:

Certifications in AWS, Terraform, Kubernetes, or DevOps;
Contributions to open-source communities or technical publications.

Apply for this job

RecargaPay

Employment type

Full-Time

Region

South America

Location

Brazil

Salary range

Prefer not to share

Posted

8 days ago

Apply Now

View All Software Development Remote Jobs ->

Similar Jobs

Senior Software Engineer
Pierce

View Job

Remote - South America Software Development Full-Time

Posted 14 hours ago
Semi Senior Software Engineer
Pierce

View Job

Remote - South America Software Development Full-Time

Posted 14 hours ago
Semi Senior Fullstack Engineer
Pierce

View Job

Remote - South America Software Development Full-Time

Posted 14 hours ago
(1010) Senior Software Engineer
Nearsure

View Job

Remote - South America Software Development Full-Time

Posted 14 hours ago
Semi Senior Software Engineer
Pierce

View Job

Remote - South America Software Development Full-Time

Posted 14 hours ago