Site Reliability Engineer (SRE) at ZEFR
Marina del Rey, CA, US

ZEFR is hiring!  We are seeking a Senior Site Reliability Engineer to join our team in Marina Del Rey, CA!  At Zefr, we’re pioneering the world of contextual video advertising.  We’re seeking hardworking and talented professionals who love to solve interesting and challenging data problems while revolutionizing the global impact of the digital video ecosystem.  Our Site Reliability Engineers solve operational problems with a software engineering mindset. We enable all of Engineering's productivity and production worthiness through coding prowess, systems and software architecture, and SRE principles. We are tightly integrated into the Software Development life-cycle from inception and design through deployment, operation and refinement.

Tech Stack:

  • AWS, GCP, Linux
  • Python, Kotlin, React, Node.js
  • Jenkins, Spinnaker, Terraform
  • Docker, Kubernetes (EKS), Helm, Kafka, Elasticsearch, PostgreSQL, AWS Aurora, Apache Airflow, Redis

 

Here's what you'll get to do:

  • Build systems and tools that enable other engineers to write and deploy code quickly and safely
  • Foster and push our DevOps culture and mindset by encouraging continuous improvement
  • Proactively maintain the health of production environments
  • Mature our CI/CD workflows and methods
  • Collaborate with other engineers to architect secure, resilient, cost-efficient applications in AWS and Google Cloud Platform
  • Participate in change management, deployment plan creation and review
  • Respond to system performance issues and outages
  • Debug code at the application and infrastructure level
  • Participate in 24/7 on-call rotation
  • Propose and review Engineering Request for Comments (RFC) to drive Engineering architecture and practices

Here's what we're looking for:

  • Bachelor Degree in Computer Science or a related field or equivalent work experience
  • 5+ year job history doing software design and development 
  • Fluency in one or more programming languages: Python, Go, Java, Kotlin (Python preferred)
  • History developing and debugging container based micro-services
  • Solid understanding of Linux system and networking fundamentals
  • Strong working knowledge of logging, monitoring and alerting workflows
  • Desire to build tools and automation
  • Proficiency using containers and container orchestration platforms (Docker, Kubernetes, ECS, etc)
  • Strong written and oral communication, organization, and documentation skills
  • Ability to work both independently and collaboratively

Nice to haves:

  • Production experience using major public cloud providers (AWS, GCP, Azure) 
  • Deep understanding of CI/CD pipelines, techniques and technologies (Spinnaker, Jenkins, GitLab, CircleCI, etc)
  • Experience with application performance and infrastructure monitoring
  • Knowledge of operational deployment and configuration management (Terraform, Ansible, Saltstack, CloudFormation, Chef, etc.)