Senior Site Reliability Engineer

Details of the offer

**Site Reliability Engineer**:
**About DKatalis**

DKatalis is a financial technology company with multiple offices in the APAC region. In our quest to build a better financial world, one of our key goals is to create an ecosystem linked financial services business.

DKatalis is built and backed by experienced and successful entrepreneurs, bankers, and investors in Singapore and Indonesia who have more than 30 years of financial domain experience and are from top-tier schools like Stanford, Cambridge London Business School, JNU with more than 30 years of building financial services/banking experience from Bank BTPN, Danamon, Citibank, McKinsey & Co, Northstar, Farallon Capital, and HSBC

**About the role**

The SRE Team at DKatalis is a 24/7 operation in charge of maintaining the digital platform that serves Bank Jago's system and services. You'll experience a mix of activities from optimizing Kubernetes to maintaining systems uptime, to debugging production issues and running runbooks to mitigate potential production issues. One of the key objectives of the SRE teams is to constantly improve and uphold the reliability of our digital platform, our software release such as deployment processes and automation of recurring tasks. You should have a strong software engineering background and have the opportunity to collaborate with various software squads in building up a reliable digital platform.

**Technologies We Use**:

- Cloudflare, Google Cloud - GKE, Kubernetes, Tyk API Gateway
- Gitlab, Terraform
- NodeJS, Java, Redis, Mongo, Kafka,
- Dynatrace, Pagerduty

**Role and Responsibilities**:

- Participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- Ability to balance doing things right with fixing things quickly. Flexible and pragmatic, while working towards improving the long-term health of the system.
- You have a strong systems experience with good coding practice.
- The team will be responsible for analyzing systems based on data points to identify workloads that are critical to the business.
- Comfortable working cross-functionally to ensure success of the system's operation. You will be closely collaborating with other engineering and product teams to ensure that expected system behavior is understood and monitoring exists to detect anomalies.
- Lead in-depth technical and data analysis to gauge service trends and drive improvements.
- You are comfortable with on-call responsibility and are able to manage a crisis working with the broader team, communicating progress and challenges during the crisis.
- Participate in continuous improvement and execution of quality and timely major incident root cause analysis and blameless post-mortem activities to ensure we take action to avoid similar problems in the future.
- Contribute to prioritization of reliability features and contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
- Contribute to proactive technical communication of reliability, stability and efficiency results (based on Service Level Objectives), service health (via dashboards) key reliability risks and issues to senior business and technology stakeholders - to prioritize activity (based on trend analysis) and direct investment and action.

**Requirements**:

- You are either a Software Engineer with real interest, and ideally some experience in Linux systems, networking, monitoring and automation; or an experienced sysadmin or systems engineer with professional skills in Linux, preferably on distributed systems at scale, and a demonstrable interest and experience in using software engineering to solve operational problems.
- Comfortable writing software to automate API-driven tasks at scale. Cloud Tooling engineers primarily use NodeJS and /or Java and Go are also key languages in our environment.
- Experience automating the build and deployment of software products, and understand the related challenges in distributed systems.
- 8+ years experience in software development and/or SRE functions with at least 3 years in a senior/lead capacity
- Degree in Computer Science, Engineering, or equivalent experience.
- Experience and advanced understanding of Observability, CI/CD and release management.
- Well-rounded broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and Dev Ops
- Experience working with large-scale distributed systems with understanding of microservices architecture concepts
- Strong organizational skills and the ability to effectively manage multiple tasks simultaneously
- Capable of working in a complex, fast paced environment and ability to maintain calm during stressful situations

Nominal Salary: To be agreed

Source: Whatjobs_Ppc

Job Function:

Engineering

Requirements

Similar offers

See more similar offers

Server-The St. Regis Bar - The St. Regis Jakarta

**Job Number** 24100076 **Job Category** Food and Beverage & Culinary **Location** The St. Regis Jakarta, Rajawal Place Jalan HR Rasuna Said Kav. B/4, Jaka...

Marriott International, Inc - Jakarta

Published a month ago

Software Quality Assurance (Qa) Engineer

PT Inti Corpora Teknologi was established as one of the subsidiaries of PT. Infracom Technology in Jakarta. As an IT solutions and services provider, we aim...

Pt Inti Corpora Teknologi - Jakarta

Published a month ago

Qa Engineer (Games) Intern - Sea Labs

Department Engineering and Technology- LevelInternship- LocationIndonesia - JakartaThe Engineering and Technology team is at the core of the Shopee platform ...

Shopee - Jakarta

Published a month ago

Purchase Engineer

SOURCING EXECUTIVE FOR OIL AND GAS, SHIPBUILDING AND HEAVY ENGINEERING INDUSTRY - Independently Source for International Brands from multiple sources to comp...

Raah International India Pvt. Ltd. - Jakarta

Published a month ago

Built at: 2024-12-23T12:15:24.068Z