LATAM - Senior SRE

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

4+ years of experience in Site Reliability Engineering, DevOps, or a similar role., Proficiency with observability tools like Prometheus, Grafana, and AWS CloudWatch., Strong hands-on experience with AWS and managing cloud resources, including automation and CI/CD pipelines., Excellent communication skills in a remote, asynchronous environment..

Key responsabilities:

  • Monitor and maintain AWS infrastructure and deployment pipelines during off-hours.
  • Ensure high availability and reliability of platform services by managing system health and responding to incidents.
  • Implement observability solutions and set up monitoring dashboards and alerting systems.
  • Document insights, decisions, and progress to facilitate asynchronous collaboration with the team.

Nas Company logo
Nas Company Information Technology & Services Scaleup https://nas.io/
51 - 200 Employees
See all jobs

Job description

Full-time - Latin America

Nas Company is a media and tech company on a mission to help people feel more connected, both online and offline.

With a team of 100 amazing people from 30 countries, we’re leading the way in the creator economy, reaching 300 million people worldwide every month.

We believe in making real connections at a global scale- bringing people together, no matter where they are.

Why should you be part of our success story?

At Nas Company, we help some of the world's biggest brands level up their social media and reach millions of people. We’re not just creating content—we’re creating the next wave of storytellers. We've trained companies, governments, and organizations on how to make viral content, empowering their employees, customers, executives, and even citizens to connect and share stories.

Our focus is on mastering storytelling, building communities, and running powerful campaigns. We help the world’s top brands share their messages, create impact, and build communities that last.

Our partners include Google, Facebook, the Bill & Melinda Gates Foundation, AppsFlyer, Canon, Grab, eToro, Coinbase, Solana, DW Bonn, University of Maryland, and many more.

We've raised $23 million so far, backed by top VC investors like Lightspeed Venture Partners, Pitango, and 500 Global, to help make even bigger stories and bring people together.
Position Overview
We are looking for a Senior Site Reliability Engineer (SRE) to join our distributed engineering team and lead our reliability, observability, and infrastructure initiatives. In this remote role (Latin America timezone), you will be the primary on-call engineer during Asia-based off-hours (approximately 8:00 PM – 9:00 AM GMT+8) to ensure our platform remains stable and performant. The ideal candidate is a seasoned, autonomous engineer who can maintain platform stability with minimal direct oversight. You will work mostly asynchronously, collaborating with the team via documentation and chat, with one weekly synchronous meeting to align with the broader engineering group.
Team Setup & Reporting: You will report directly to the Head of Engineering (Edwin Candinegara). Given the time overlap, you will have minimal working-hour overlap with our core engineering team in Singapore/India, so strong communication and independent decision-making are crucial. Expect primarily asynchronous communication, with weekly sync-ups for team meetings or critical discussions.
Key Responsibilities
  • Infrastructure Monitoring & Maintenance: Independently monitor, maintain, and improve our AWS infrastructure and deployment pipelines during off-hours to ensure smooth operations even when others are offline.
  • Platform Reliability: Ensure high availability, reliability, and uptime of all platform services (web, backend, and mobile) by proactively managing system health and responding to incidents swiftly.
  • Observability & Alerting: Implement robust observability solutions – set up monitoring dashboards, logging, and real-time alerting across all systems (web applications, backend services, mobile API) using tools like Prometheus, Grafana, Datadog, AWS CloudWatch, etc.
  • Performance & Cost Optimization: Continuously monitor AWS and related infrastructure performance. Optimize resource usage and configurations for improved performance and cost efficiency (e.g., right-sizing instances, caching improvements, query optimization).
  • Asynchronous Collaboration: Work closely with product and engineering teams in an asynchronous manner. Document your insights, decisions, and progress clearly so team members in other timezones can follow along and contribute.
  • Incident Management: Proactively identify and resolve production issues. Act as the first responder to any system incidents during your shift, performing root cause analysis and restoring service. Communicate incidents and fixes to the team, and update runbooks for future reference.
  • Documentation & Playbooks: Develop and maintain internal SRE documentation, runbooks, and playbooks. Ensure that troubleshooting guides, deployment processes, and escalation protocols are well-documented and easy to follow for the entire engineering team.
Qualifications & Skills
  • Experience: 4+ years in a Site Reliability Engineer, DevOps, or similar role, with a track record of maintaining and scaling web infrastructure.
  • Observability Tools: Proficiency with monitoring and observability tools such as Prometheus, Grafana, Datadog, and AWS CloudWatch. You know how to instrument applications and set up alerts that catch issues early.
  • Cloud & DevOps: Strong hands-on experience with Amazon Web Services (AWS) and managing cloud resources. Familiarity with MongoDB Atlas (managed MongoDB) and deployment platforms like Vercel. Comfortable automating infrastructure (Infrastructure as Code, CI/CD pipelines) and managing deployments.
  • Tech Stack Familiarity: Exposure to modern web development stacks. Our environment includes Node.js/Python backends, Next.js frontends, Redis caching, and a Flutter mobile app. Direct coding in these is not mandatory, but understanding how these components work is a plus.
  • CI/CD & Automation: Excellent grasp of CI/CD concepts and tools. Experience implementing build pipelines, continuous integration, and automated deployments. Knowledge of Docker, container orchestration, and version control workflows.
  • Problem Solving: Strong analytical and problem-solving skills. Able to debug complex issues across distributed systems and find root causes. Experience with incident response and post-mortem analysis is highly valued.
  • Communication & Autonomy: Outstanding communication skills in a remote, asynchronous setting. You can document your work and decisions clearly. Highly self-driven and able to make sound decisions independently, especially during the hours when other team members are offline.
Special Inquiries for the Hiring Process
The hiring process I have in mind:
  • At least 1 technical interview with someone in the team (1.5 hours).
    • We may have more if we think the result for the first interview is not an obvious yes.
    • The interview will be around asking about past experience, fundamental coding skills, fundamental computer science backgrounds, and system design skills (related to SRE stuffs).
  • Interview with the head of engineering.
  • Interview with the CEO (if needed).
Specific details or unique aspects related to the position
  • Independent Impact: Because this role covers hours when the Asia-based team is offline, you’ll often be the point person for urgent decisions. You should be comfortable making critical calls independently to keep systems running, with the trust and empowerment of the team behind you.
  • Async Communication: Our workflow is primarily asynchronous. Outside of a weekly team meeting, you’ll communicate through tools like Slack, documentation, and pull requests. This means fewer interruptions and the freedom to structure your work, but it also requires discipline in keeping the team informed through writing.
  • Define SRE Practices: As the dedicated SRE, you will play a key role in defining and refining Nas.io’s reliability and infrastructure practices. You’ll have the opportunity to influence tooling choices, establish best practices for monitoring/alerting, and shape incident management and response processes. Your work will lay the foundation for how we maintain and scale our systems reliably as we grow.
  • Global Team & Culture: You’ll be joining a diverse, distributed team spanning Asia and other regions. We pride ourselves on a culture of mutual respect, continuous learning, and bias for action. Even though you’ll operate with a lot of autonomy, you’re never truly alone – the team is always a message away, and we make sure to celebrate successes and learn from failures together.
Our Company Values

 

  • Think in Days: We don't think about how many years something will take, we think about how many days. This is our secret to fast growth.
  • Always Share Secrets: We don't believe in secrets. We share with our followers and our employees what we do and why we do it.
  • Never Miss a Monday: We never miss a meeting. We never miss a deadline. We never miss a commitment. Commitment is at the core of our DNA.
  • Be A Force For Good: All our actions intend to bring a positive impact to everyone around us.
  • Explain for a 10-year old: We communicate in a simple and inclusive manner, free from unnecessary complexity.


Benefits

At Nas Company, we work hard, REALLY HARD! So we value our employees' well-being and believe that a supportive and fulfilling work environment contributes to their success. We're proud to offer a range of benefits that will enhance your professional and personal life. Here's what you can look forward to as part of our team:

😷 Medical Insurance: We care about your health. Our comprehensive medical insurance plans are tailored to each region, ensuring you have the coverage you need.

🧘🏽 Self Care/Wellness Fund: Your well-being matters. We provide a monthly fund of $100 to spend on activities that bring you joy and promote your self-care.

🧠 Mental Health Fund: We care about your mental health too! We provide a monthly fund of $150 to spend on therapy or career coaching.

🌴 Paid Time Off: You’ll be entitled to paid time off based on your region, in line with the company policy.

🏖️ Yearly Retreats: Unwind, bond, and collaborate at our annual company retreats. A time for the entire company to come together to rejuvenate and bond.

💰 Stock Options/Profit share: As part of our team, you’ll have the opportunity to potentially own a piece of the company through stock options or profit sharing, depending on company discretion, aligning your success with ours.

Required profile

Experience

Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Communication
  • Problem Solving

Related jobs