Senior Site Reliability Engineer
One of our top clients is seeking a Sr. Site Reliability Engineer. Their enterprise software platform is one of the largest platforms in the world that is truly helping people in need. Some of their executives come from companies such as Google, Groupon, etc. This engineer will work on a small team, responsible for the full system life cycle including AWS infrastructure, system configuration, deployments, monitoring and IR in production environments.
This is a great opportunity to work with cutting edge tech on a platform that is truly making a difference in the world. They offer competitive salaries, full benefits, 401k, base + bonus+ stock, as well as a working from home stipend.
They will be REMOTE until some time next year. After that, they are open to remote or onsite work, but must be local to San Diego to come in for meetings. This role will sit in San Diego, though they are also open to Los Angeles or San Francisco offices.
- Design and build out our cloud infrastructure (we run everything in AWS).
- Participate in software and system performance analysis, tuning, and service capacity planning.
- Manage the availability, scalability, security, and performance of our platform and applications.
- Diagnose bottlenecks for the full stack and provide recommendations to overcome the bottlenecks as an interim work around, while long-term solutions are investigated.
- Periodically assess all monitoring requirements and implement enhancements to meet or exceed changing business needs.
- Proactively review, recommend, and implement changes to the live infrastructure after ensuring the right validation has been carried out.
- Use data analysis to pick up trends before they become major problems.
- Perform 24/7 on-call duties.
- 5+ years of experience in operating high-traffic SaaS environments.
- Deep expertise in the mentality, processes, and tools needed to deliver five nines.
- Skills to build a fully automated, highly elastic cloud orchestration framework on AWS.
- Strong working knowledge of Linux and its underlying components, system statistics, performance tuning, filesystems and IO.
- Solid scripting skills (e.g. Bash, Python).
- Development experience (e.g. Python, PHP, Java, Kotlin).
- Experience with continuous integration frameworks.
- Experience with performance diagnostics, performance tuning, capacity planning, and monitoring.
- BS in Computer Science or equivalent.
- Good verbal and written communication skills.
Technologies you are likely to be working with...
AWS, Ansible, Terraform, MySQL/Aurora, Redshift, Nginx, Apache, Docker, Kubernetes, Elasticsearch, Kafka, Memcached, Redis, RabbitMQ, Jenkins, Git, Bash, Python, PHP, Java, Kotlin, Ruby, Nessus, Nagios, Sumologic, NewRelic, PagerDuty