At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte's talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us!
**Position Summary**
**Essential Functions/Responsibilities**
The Site Reliability Engineer (SRE) is accountable for the following functions and responsibilities:
- Review, provide feedback, and mentor coworkers on changes to maintain reliability.
- Investigate and resolve infrastructure and operational issues, identifying root causes and implementing effective fixes.
- Perform infrastructure and operational tasks with scalability and stability in mind.
- Develop automation solutions to optimize tasks, improve efficiency, and reduce manual effort.
- Conduct thorough investigations of operational incidents and proactively prevent future issues.
- Design, implement, and maintain scalable infrastructure and deployment frameworks using K8s and CNCF projects.
- Establish a secure, cost-effective, and scalable cloud platform.
- Collaborate with stakeholders to deliver cost-effective, excellent infrastructure solutions and identify areas for improvement.
- Communicate directly with clients, understanding their needs and providing exceptional support.
- Meet requirements for engineering excellence
- Perform any other duties as required.
**Qualifications/Experience Required**
- Specializes in operations and reliability automation.
- 3+ years of professional infrastructure and operational engineering experience with Linux administration.
- Proven track record of infrastructure as code, configuration management, and package management.
- Collaborative completion of infrastructure or operational projects.
- Eagerness to learn new languages, technologies, and containerization principles (e.g., Docker, Kubernetes).
- Practical knowledge of networking, storage, and container technologies.
- Robust knowledge and experience in cloud computing (preferred AWS/GCP).
- Proven experience with automation, CI/CD, and GitOps tools.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, ELK/EFK, Splunk, Datadog, OpsGenie, PagerDuty).
- Software development and scripting experience with Bash, Python, and/or Golang.
- Proficiency in written and verbal English language for remote work.
- Flexibility to adjust work routines/schedules to meet company and customer needs.
- Previous professional infrastructure or operational experience preferred.
- Experience at a AAA game studio or software product company preferred.
- Experience working with cloud platforms or web products preferred.
- Experience in a multinational technology startup is a big plus.