I really thought we had fixed this. I was thinking of retiring Next Gen DevOps because it was out of date having been published 11 years ago. Sadly it isn’t. If your development teams need to beg for favours from your ‘platform team’ in order to get things done then you’ve recreated the nightmare of the Ops team. If the term ‘beg’ offends you then feel free to substitude the term ‘agree priorities’.
In this post I am going to give some helpful advice about how to enable your development teams to get to production without any blockers AND minimise their toil AND ensure your infrastructure is built safely and securely and in compliance with all your companies standards.
The key to enabling the development teams is to free them from things they don’t need to care about in order to ensure the quality, performance, availability and security of their applications. Equally the key to having a platform team that can enable the development teams while also ensuring infrastructure is built securely and compliant with all company standards it where to draw the line. On Amazon this looks something like this:
- Creating the VPC
- Creating all the subnets
- Creating the Internet Gateway
- Creating the NAT Gateway
- Mapping the public and the private subnets
- Creating all security groups
- Creating Policies
- Creating CloudWatch Log group
- Creating secrets using secrets manager
- Creating Parameter Store
- Creating Load Balancer
- Creating Target Groups
- Creating Listener
- Creating ECR Repositories
- Preparing the docker images
- Creating IAM Roles and Policies
- Creating ECS cluster
- Creating ECS task definitions
- Creating ECS services
The items above the line are global cloud configuration, cost management controls and should be owned by the platform team. The items below the line are application specific configuration and should be owned by each application team.
This split should be agreed and regularly reviewed by the engineering community as a whole. If items below the line are seen to be toil, by the application teams then they need some training to teach them how to reduce that toil for themselves.
As with security we trust but we verify compliance with company standards. These should be tested by running scripts and using AWS Anomaly detection.
One of the key points I make in Next Gen DevOps is how the concept of operations teams went wrong because they had different priorities from the teams building applications and generating revenue. That point is just as valid for Platform teams today as it was for operations teams ten years ago.
If you’re an Engineering Leader and you are considering creating a platform team pay close attention to what their priorties and motivations will be. If they are ostensibly motivated by safety and cost control and your application teams are motivated by revenue and experimentation then you’re going to have problems. You have one team pulling and another pushing. If you want some help thinking through this and aligning the priorities then get in touch or buy the book.