At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte's talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us!
**Position Summary**:
As a **Senior Site Reliability Engineer**, you design, implement, and maintain infrastructure and operational systems that accomplish a given goal. You discover requirements and guide other engineers collaborating in an area and do exemplary work on complicated problems. You optimize performance, drive efficiency, and ensure the reliability of critical infrastructure.
**Essential Functions/ Responsibilities**:
The **Senior Site Reliability Engineer **is accountable for the following functions and responsibilities:
- Review, provide feedback, and mentor coworkers on changes to maintain reliability.
- Design and develop infrastructure and operational tasks with scalability and stability in mind.
- Contributing in automating solutions to optimize tasks, improve efficiency, and reduce manual effort.
- Design, implement, and maintain scalable infrastructure and deployment frameworks using K8s and CNCF projects.
- Direct a secure, cost-effective, and scalable cloud platform.
- Initiate and conduct thorough investigations of operational incidents and proactively prevent future issues and designing resilient approaches based on insights from operational incidents for long-term mitigations.
- Collaborate with stakeholders to deliver cost-effective, excellent infrastructure solutions and identify areas for improvement.
- Communicate directly with clients, understanding their needs and providing exceptional support.
- The ability to train and mentor less experienced engineers and set the direction for other engineers.
- Model standards for engineering excellence
- Discover requirements by working with PMs and stakeholders
- Perform other duties as assigned
**Qualifications/Experience Required**:
- Specializes in operations and reliability automation.
- 5+ years of professional infrastructure and operational engineering experience with Linux administration.
- Proven track record of infrastructure as code, configuration management, and package management.
- Collaborative completion of infrastructure or operational projects.
- Familiar with Nomad.
- Eagerness to learn new languages, technologies, and containerization principles (e.g., Docker, Kubernetes).
- Practical knowledge of networking, storage, and container technologies.
- Robust knowledge and experience in cloud computing (preferred AWS/GCP).
- Proven experience with automation, CI/CD, and GitOps tools.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, ELK/EFK, Splunk, Datadog, OpsGenie, PagerDuty).
- Software development and scripting experience with Bash, Python, and/or Golang.
- Proficiency in written and verbal English language for remote work.
- Flexibility to adjust work routines/schedules to meet company and customer needs.
- Previous professional infrastructure or operational experience preferred.
- Experience at a AAA game studio or software product company preferred.
- Experience working with cloud platforms or web products preferred.
- Experience in a multinational technology startup is a big plus.