新規登録・ログインをしてスカウトメールや保存した求人を確認しよう
新規登録・ログインをして求人を探そう
求人ID : 1536741 更新日 : 2025年04月30日

PR/158845 | Site Reliability Engineering Lead

勤務地 マレーシア, kuala lumpur
雇用形態 正社員
給与 経験考慮の上、応相談

募集要項

COMPANY OVERVIEW
A well-established client of us in Kuala Lumpur is seeking for Site Reliability Engineering Lead. 

 

JOB RESPONSIBILITIES

  1. Team Leadership:

○       Lead and mentor a team of SREs, fostering a culture of ownership, collaboration, and continuous improvement.

○       Define clear goals, performance metrics, and development plans for the team.

  1. System Reliability & Performance:

○       Design and implement strategies to improve system reliability, scalability, and performance.

○       Conduct root cause analysis of production incidents and develop preventive solutions.

  1. Infrastructure Management:

○       Oversee the deployment, monitoring, and management of production environments.

○       Collaborate with development teams to design cloud-native infrastructure and architecture.

  1. Automation & CI/CD:

○       Drive automation of operational processes, reducing manual intervention and response times.

○       Optimize CI/CD pipelines to ensure smooth and rapid deployments.

  1. Incident Management:

○       Establish incident response protocols and lead efforts during major incidents.

○       Ensure robust monitoring and alerting systems are in place to proactively detect issues.

  1. Collaboration & Communication:

○       Act as a liaison between engineering, operations, and other teams to align objectives.

○       Share insights and best practices with internal stakeholders to enhance overall system resilience.

 

JOB REQUIREMENTS

  1. Technical Expertise:

○       Strong experience with cloud platforms (AWS, Azure, Google Cloud) and infrastructure-as-code tools (Terraform, Ansible, etc.).

○       Proficiency in programming/scripting languages (Python, Go, Shell, etc.).

○       Deep knowledge of Kubernetes, containerization, and distributed systems.

  1. Leadership Skills:

○       Proven track record of leading SRE or DevOps teams and managing large-scale production environments.

○       Strong decision-making, prioritization, and problem-solving capabilities.

  1. Monitoring & Metrics:

○       Expertise in implementing and using monitoring tools (Prometheus, Grafana, Datadog, etc.) and logging systems.

○       Familiarity with service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.

  1. Soft Skills:
  2. Experience:

○       Excellent communication and collaboration skills to work across cross-functional teams.

○       Ability to mentor and upskill team members, fostering a learning-oriented culture.

○     At least 8 years of experience in SRE, DevOps, or related roles with a focus on reliability engineering

応募必要条件

職務経験 3年以上
キャリアレベル 中途経験者レベル
英語レベル ビジネス会話レベル
日本語レベル ビジネス会話レベル
最終学歴 短大卒: 準学士号
現在のビザ 日本での就労許可は必要ありません

勤務地

  • マレーシア, kuala lumpur

労働条件

雇用形態 正社員
給与 経験考慮の上、応相談
業種 ITコンサルティング

職種

  • その他 > その他