Remote Site Reliability Engineer (SRE) Specialist

Remotestar

Cambridgeshire, United Kingdom Full-time in I.T. & Communications
    Share:
    • Job ID 2773203

    Job Description

    Company Overview:

    Join RemoteStar as we collaborate with a premier multinational IT services and consulting firm at the forefront of digital transformation, cloud computing, and AI innovations. This forward-thinking organization unites enterprises from diverse sectors, providing them with advanced technology solutions that drive success and efficiency.

    Position Overview:

    We are seeking a talented Site Reliability Engineer (SRE) with 5 to 9 years of experience to be part of our dynamic team. This role is open to candidates across Pan India and will begin with a remote work model, transitioning to a hybrid approach, requiring you to be in the office three days a week in the near future.

    Work Schedule:

    The working hours for this position are structured as follows: 1 PM to 10 PM or 2 PM to 11 PM, with a five-day workweek.

    Industry Focus:

    A strong background in the healthcare industry is a key requirement for this role.

    Key Responsibilities:

    • Address operational challenges, including production failures, security concerns, and infrastructure issues.
    • Ensure the continuous availability, optimal performance, and scalability of applications and websites.
    • Collaborate closely with developers to proactively identify and rectify potential problems before they impact user experience.
    • Monitor system performance and develop strategic plans for incident response.
    • Participate in capacity planning and performance tuning to accommodate growing traffic demands effectively.
    • Leverage your deep understanding of distributed systems for troubleshooting and optimization purposes.

    Technical Expertise:

    • Proficient in utilizing various monitoring tools, including AppDynamics, Splunk, and GCP Operations Suite.
    • Extensive knowledge of different database types for effective issue resolution.
    • Experience with cloud-native applications to manage them seamlessly.

    Communication Skills:

    • Ability to articulate system alerts and outage scenarios clearly to team members.
    • Efficiently handle unexpected outages or performance problems.
    • Familiarity with automation, configuration management, and monitoring tools, specifically in Azure and GCP environments.

    Additional Notes:

    We are looking for candidates who are enthusiastic about advancing their SRE practices across the division. You should be comfortable taking on a leadership role within the SRE framework, championing best practices and driving excellence in site reliability.

    Other jobs you may like

    10x your chance to get hired

    Land a job without sending dozens of applications!

     

    Let employers find you

     

    Happy Remote Worker