Staff Site Reliability Engineer at Evernote
Santiago, CL

Our SRE team is responsible for the overall reliability and performance of Evernoteʼs service and products to over 200 million world-wide users around the world with billions of notes and files

We participate in all aspects of running our platform at scale, focusing on both the service as it runs today and ensuring we can deliver new and exciting features rapidly to users

We work hand-in-hand with product teams to help them ship production-ready services and get new features in our userʼs hands

What you’ll do

You will maintain and evolve the configuration, monitoring, metrics, reporting, processes, tools and documentation of our global cloud platform

You will research and analyze new technology to solve problems at all layers of our stack

You will develop software and maintain automation systems to reduce toil and to run our infrastructure at scale

You will participate in an on-call rotation to help maintain the availability of our service so that users always have access to their data

Who you are

You take initiative and lead by example to motivate your peers

You focus on quality to build resilient, scalable, and maintainable systems.

You enjoy solving tough technical problems

You always want to understand the why in order to better see patterns and improve quality

You have strong interpersonal and communications skills

What you’ve done

You know Linux systems like the back of your hand and mastered the fundamental TCP/IP networking protocols (e.g. HTTP, DNS, etc)

Youʼve managed large-scale, online web service at scale in a public cloud environment (AWS or GCP)

You have a strong familiarity with web applications including MySQL, Java, Apache

Youʼve used configuration management and orchestration tools and you understand why theyʼre important

You have the ability to solve problems quickly and automate processes using scripting and other programming languages

Skills that are particularly meaningful to us


System and application level debugging

Experience with systems at scale

Exposure to configuration management, orchestration, and/or automation tooling

Plusses: Ansible, Puppet, Datadog, Splunk, Kubernetes, Docker