Login or register to see your saved jobs and receive scout emails
Login or register to find a job
Job ID : 1507264 Date Updated : December 3rd, 2024

PR/158309 | Site Reliability Engineer

Location Malaysia,
Job Type Permanent Full-time
Salary Negotiable, based on experience

Job Description

Site Reliability Engineer
 
Company and Job Overview
A world-class consulting service company sought to hire talented Site Reliability Engineer due to expansion. As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. This role emphasizes strong system architecture and design principles, focusing on key SRE practices such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and the reduction of operational toil. 
 
Job Descriptions:
  • Design and implement resilient system architectures and automation tools and scripts that support high availability and enhance operational efficiency and reduce manual effort.
  • Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs.
  • Conduct thorough post-mortem analyses following incidents, driving continuous improvement through root cause identification and solution implementation.
  • Collaborate with development and operations teams to establish best practices in system reliability and incident management.
  • Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
  • Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery.
  • Identify and troubleshoot performance bottlenecks across systems, providing actionable recommendations for enhancements.
  • Maintain detailed documentation of processes and incident responses to support knowledge sharing and compliance.
  • Actively focus on developing effective communications and relationship-building skills with stakeholders, clients and team.
Job Requirements:
  • Proficiency in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency.
  • Demonstrated experience in system architecture and design, prioritizing reliability, and scalability.
  • Strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post-mortems.
  • Experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management.
  • Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization.
  • Strong expertise in Linux system administration.
  • Familiarity with networking concepts and effective troubleshooting techniques.
  • Familiarity with monitoring tools and performance optimization techniques.
  • Experience in scripting or automation for system administration tasks.
  • Knowledge of networking concepts and troubleshooting methodologies.
Apply online or feel free to contact me directly for more information about this opportunity. Due to the high volume of applicants, we regret to inform you that only shortlisted candidates will be notified. Thank you for your understanding. 
 
#LI-JACMY

General Requirements

Minimum Experience Level Over 3 years
Career Level Mid Career
Minimum English Level Business Level
Minimum Japanese Level Business Level
Minimum Education Level Associate Degree/Diploma
Visa Status No permission to work in Japan required

Job Location

  • Malaysia,

Work Conditions

Job Type Permanent Full-time
Salary Negotiable, based on experience
Industry Business Consulting

Job Category

  • Other > Other