Our client in San Diego is looking for a Site Reliability Engineer. Below is a BRIEF job description. Please contact us for more information on this role and client.
- Design and build out our client’s cloud infrastructure (our client run everything in AWS).
- Participate in software and system performance analysis, tuning, and service capacity planning.
- Manage the availability, scalability, security, and performance of our client’s platform and applications.
- Diagnose bottlenecks for the full stack and provide recommendations to overcome the bottlenecks as an interim work around, while long-term solutions are investigated.
- Periodically assess all monitoring requirements and implement enhancements to meet or exceed changing business needs.
- Proactively review, recommend, and implement changes to the live infrastructure after ensuring the right validation has been carried out.
- Use data analysis to pick up trends before they become major problems.
- Perform 24/7 on-call duties
- 3+ years of experience in operating high-traffic SaaS environments.
- Deep expertise in the mentality, processes, and tools needed to deliver five nines.
- Skills to build a fully automated, highly elastic cloud orchestration framework on AWS.
- Strong working knowledge of Linux and its underlying components, system statistics, performance tuning, filesystems and IO.
- Solid scripting skills (e.g. Bash, Python, Ruby).
- Development experience (e.g. Python, PHP, Java, Kotlin).
- Experience with continuous integration frameworks.
- Experience with performance diagnostics, performance tuning, capacity planning, and monitoring.
- BS in Computer Science or equivalent.