Smart is growing its technology team and as part of our growth, we are now looking to hire a Site Reliability Manager to own our platform as we dramatically scale our operations.
Being part of a tech team of 190 people we are looking to grow our current DevOps team. Are you a passionate individual with prior SRE or DevOps experience looking for a great opportunity to help craft the future Smart Pension for years to come?
As part of your responsibilities, you will be expected to own and be accountable for, the reliability, security and deployment of various systems.
Manage, maintain & support our AWS and Heroku environments.
Build appropriate tooling for development teams & clients.
Reduce manual work (toil) for the technology team yourself included! - by implementing appropriate solutions or processes as required.
Design & implement improved data security strategies for our platform.
Own & implement appropriate monitoring & alerting as we move to a 24 / 7 / 365 platform.
Assist the development team in using these metrics to ensure the reliability of our platform remains excellent across all deployments.
Define Service Level Objectives for the Smart Platform, including working with other departments to define appropriate availability targets.
Own & implement new continuous deployment & release improvements to support Smart's ambitious future plans, including deployment to secured third-party environments for a variety of different clients.
Keep up to date with best practices in application hosting and continuous deployment.
Update or produce documents to describe changes to the platform.
This position includes some level of out of hours on-call after your initial training period and will be supported by the rest of the engineering team you are not expected to be on-call 24 / 7 / 365.
At Smart, we're a diverse team, made up of people from different backgrounds, experiences and skills. Our goal is to build great products to help people plan for their financial futures.
We’re constantly developing new ideas to help people look after their pension schemes, in the UK and abroad. We’ve grown to a team of over 350 talented people, all dedicated to creating the best experience for our customers.
Recently we made it onto Great Places to Work UK's Best Workplaces 2020 at the no.70 spot for medium-sized companies! If you think you can help us build a smarter future, come and work with us.
Our Recruitment Data Policy is here. Please click on the link if you have any questions about how we store your data or to know your rights.
Proven track record of supporting web platforms (e.g. websites or API’s) in production at scale.
Experience in cloud platforms to upgrade and fix.
Experience defining appropriate metrics to monitor web platforms, and alerting on breaches of those metrics.
Ability to monitor and identify network policy violations and system breaches.
Experience with incident response in an on-call environment.
Experience with continuous delivery and zero-downtime deployments.
Experience with AWS, MySQL, and Redis (or similar platforms).
Comfortable with command-line tools and environments. Linux experience is essential.
Proficient with at least one server-side programming language such as Ruby or Python.
Experience with configuration management tools like Terraform or Ansible, and understand their common use cases.
Enjoys complex problem solving and delivering results.
Familiarity with Heroku or Heroku-like PaaS.
Familiarity with Amazon Aurora.
Experience with container deployment platforms like Kubernetes.
Experience with Docker & Containerisation.
Prior evidence of developing command-line tools.
Experience with Serverless technologies e.g. AWS Lambda.
PLN 2250 personal training budget to spend on books, courses, conferences or training materials to help you develop.
Health Insurance including dental care