Job Description
Description Zensar Technologies Limited is looking for a talented and motivated Site Reliability Engineer (SRE) to join our team in Bangalore. This long-term contract role offers a hybrid work arrangement, with a flexible shift schedule to accommodate future changes. As an SRE, you will play a crucial role in driving the reliability and performance of our collaboration services, ensuring a seamless experience for our users. Your expertise in Kubernetes and AWS will be instrumental in optimizing our cloud and hybrid environments. Responsibilities - Own the deployment and operation of critical collaboration services, ensuring high availability and scalability across cloud and hybrid environments. - Design and optimize CI/CD pipelines and automation, incorporating AI-first tooling for efficient deployment, monitoring, and incident response. - Lead incident response for complex production issues, conducting thorough root cause analysis and implementing systemic improvements. - Utilize observability data to guide capacity planning, scaling strategies, and resource optimization for optimal service performance. - Define and promote operational best practices, ensuring high-quality documentation and fostering a culture of reliability and operational excellence. - Collaborate with cross-functional teams to align on service requirements and ensure seamless integration with existing systems. - Stay updated with the latest industry trends and technologies, and propose innovative solutions to enhance our SRE practices. - Mentor and guide junior team members, sharing your expertise and fostering a culture of continuous learning and improvement. - Participate in on-call rotations and provide timely support during critical incidents, ensuring swift resolution and minimal impact on services. Qualifications - Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent professional experience with a minimum of 7 years in Site Reliability Engineering, Cloud Operations, or Systems Engineering. - Strong hands-on experience operating production services using Docker and Kubernetes in cloud or hybrid environments, demonstrating a track record of successful deployments and operations. - Proficiency in programming or scripting languages such as Python, Go, or Bash, with the ability to develop automation and operational tooling. - Experience with monitoring, observability, and incident response in production environments, including participation in on-call duties and post-incident reviews. - Working knowledge of Linux systems, networking, distributed systems, CI/CD pipelines, infrastructure-as-code principles, and Git-based workflows. - Familiarity with large-scale, globally distributed SaaS platforms is preferred, as is experience with hybrid cloud environments and multi-region deployments. - Ability to apply AI-assisted or automation-first approaches to SRE tooling and workflows, enhancing efficiency and reliability. - Excellent written communication skills for creating clear and concise operational documentation, runbooks, and knowledge-sharing materials. - Strong problem-solving and analytical skills, with the ability to troubleshoot complex issues and propose innovative solutions. - A collaborative mindset and a passion for continuous learning and improvement, with a willingness to share knowledge and mentor team members.
Get AI-Matched to This Job
Upload your resume and our AI will score how well you match this and thousands of similar roles.