Remote Site Reliability Engineer (SRE) Specialist
Remotestar
Cambridgeshire, United Kingdom Full-time posted 2 days ago in I.T. & Communications-
Job ID 2773203
Job Description
Company Overview:
Join RemoteStar as we collaborate with a premier multinational IT services and consulting firm at the forefront of digital transformation, cloud computing, and AI innovations. This forward-thinking organization unites enterprises from diverse sectors, providing them with advanced technology solutions that drive success and efficiency.
Position Overview:
We are seeking a talented Site Reliability Engineer (SRE) with 5 to 9 years of experience to be part of our dynamic team. This role is open to candidates across Pan India and will begin with a remote work model, transitioning to a hybrid approach, requiring you to be in the office three days a week in the near future.
Work Schedule:
The working hours for this position are structured as follows: 1 PM to 10 PM or 2 PM to 11 PM, with a five-day workweek.
Industry Focus:
A strong background in the healthcare industry is a key requirement for this role.
Key Responsibilities:
- Address operational challenges, including production failures, security concerns, and infrastructure issues.
- Ensure the continuous availability, optimal performance, and scalability of applications and websites.
- Collaborate closely with developers to proactively identify and rectify potential problems before they impact user experience.
- Monitor system performance and develop strategic plans for incident response.
- Participate in capacity planning and performance tuning to accommodate growing traffic demands effectively.
- Leverage your deep understanding of distributed systems for troubleshooting and optimization purposes.
Technical Expertise:
- Proficient in utilizing various monitoring tools, including AppDynamics, Splunk, and GCP Operations Suite.
- Extensive knowledge of different database types for effective issue resolution.
- Experience with cloud-native applications to manage them seamlessly.
Communication Skills:
- Ability to articulate system alerts and outage scenarios clearly to team members.
- Efficiently handle unexpected outages or performance problems.
- Familiarity with automation, configuration management, and monitoring tools, specifically in Azure and GCP environments.
Additional Notes:
We are looking for candidates who are enthusiastic about advancing their SRE practices across the division. You should be comfortable taking on a leadership role within the SRE framework, championing best practices and driving excellence in site reliability.