- Career Center Home
- Search Jobs
- Senior Site Reliability Engineer
Description
Lead reliability efforts, driving the development, deployment, and maintenance of the company's networks, computers, and infrastructure for Zoox’s autonomous vehicle fleet. Ensure that reliability engineering efforts align closely with organizational goals and broader business objectives. Prioritize projects in system administration and network architecture that deliver the highest business value, and drive execution of these projects. Utilize key performance indicators (KPIs) and metrics to measure the impact of engineering work, setting, tracking, and achieving targets that correlate directly with business value. Provide technical leadership while writing new and improving existing code for provisioning cloud infrastructure resources Create innovative solutions to automate systems administration functions to maintain the reliability of our internal services. Mentor junior team members.
Bachelor’s degree or foreign equivalent in Computer Science, Computer Engineering, or a related technical field. Must have 4 years of experience in a Test Engineer, Reliability Engineer or related position. Must have experience in the following: maintaining and monitoring a large-scale Kubernetes cluster; designing network architecture for a new highly available Kubernetes cluster; writing new and improving existing code for provisioning cloud infrastructure resources; building logging and metrics infrastructure supporting on-premise and cloud compute clusters and distributed systems; troubleshooting existing workloads such as node failures, networking issues and regressions; and utilizing Linux, Python, and C/C++. Telework permitted within commuting distance to worksite. $185,000 - $232,000/year.
Resumes to resumes@zoox.com, use job #5864867