Senior Site Reliability Engineer, Infrastructure at Evernote
Santiago, CL

Senior Site Reliability Engineer, Infrastructure

Our SRE team is responsible for the overall performance and reliability of Evernote’s service and products. This includes over 200 million passionate and engaged users around the world, with billions of notes and files.

The Infrastructure Engineering team in SRE creates resilient and scalable compute, network, storage, and database systems that serve as the foundation of the Evernote service. We provide our engineering teams with platforms to run the software features that delight our users. As a Senior Site Reliability Engineer, you will contribute to the ongoing mission of delivering an exceptional service to our users.

What you’ll do

You will research and analyze new technology to solve problems at all layers of our stack

You will partner closely with engineering teams to maintain and scale our platforms

You will own the development of technical standards for new services that ensure success in production environments

You will publish internal design documentation and procedures that provide detailed specifications for the engineering audience

You will develop software and maintain automation systems to reduce toil and to run our infrastructure at scale

You will design and implement secure solutions with our Security team to protect our users’ data

You will champion our SLOs and continuously improve them

You will act as a subject-matter expert for critical infrastructure and provide mentorship for the department in those areas

You will participate in an on-call rotation to help maintain the availability of our service so that users always have access to their data

What we’re looking for

You take initiative and lead by example to motivate your peers

You focus on quality to build resilient, scalable, and maintainable systems

You make decisions based on data and exercise judgement to balance risks and rewards

You partner with your teammates and thrive in a collaborative environment to tackle challenging technical problems

You share enthusiastically with your colleagues and provide strong mentorship

What you’ve done

You have 6 or more years of experience running a large-scale, online web service

You know Linux systems like the back of your hand and mastered the fundamental TCP/IP networking protocols (e.g. HTTP, DNS, etc)

You have deployed Kubernetes and cloud-native infrastructure and worked with product teams to launch and run microservices in production

You have experience with distributed web applications and service mesh platforms

You have integrated and used third-party metrics and monitoring platforms such as Datadog and Pagerduty

You have successfully deployed configuration management and orchestration tools

You have developed extensible and maintainable automation and written software that makes an SRE’s job easier

Skills that are particularly meaningful to us

Google Cloud Platform: VPC networking, GCE, GKE, GCS, PubSub, Spanner, GCS, App Engine, BigQuery, BigTable

AWS: EC2, S3

Monitoring: Pagerduty, Datadog, Splunk

Tools: Ansible, Puppet, Helm, Jenkins, Cloud Deployment Manager, Terraform

Infrastructure: Kubernetes, HAProxy, Envoy, Elasticsearch, Consul, Istio, Vault

Languages/Libraries: Python, Java, Go, Thrift, Node.js, gRPC

We are committed to an inclusive and diverse Evernote. We believe that different perspectives lead to better ideas, and better ideas allow us to better understand the needs and interests of our diverse, global Evernote Community. We welcome people of different backgrounds, experiences, abilities and perspectives and are an equal opportunity employer.