Blog

Mental Health in SRE, DevOps and Operations

The biggest problem facing the DevOps, Operations and SRE professions are also the root causes of the biggest mental health issues in the profession. I touch on this subject in the book but I wanted to write something a little more personal here.

Most DevOps, Operations and SRE teams work on their own in their technology departments. Increasingly developers, testers, business analysts, scrum masters and product managers are aligned inside product teams. If they aren’t actually in a team together they are tightly bound by shared goals and success criteria. This has not been without trauma all of these groups have struggled with their identities and their relationships with each other but have generally had well aligned goals. It’s becoming more common for all these groups to report to a single Chief Product and Technology Officer.

DevOps, Operations and SRE teams might also report into this person or they might report into a separate CIO but regardless of that they are almost never given any attention as long as the core platform is working. When those systems aren’t working they are placed under tremendous pressure, as the whole business stops and all focus is on them.

If people are treated this way they inevitably become defensive. If this treatment continues defensive will become belligerence in some people.

I was at AOL in the early part of my career. Our Director of Technology clearly wanted nothing to do with operations. His entire focus was on the development teams. On my better days I told myself that it was because we were capable and functional in operations and they needed more help than us. On my worst days I’d tell myself that he didn’t understand operations and didn’t want anything to do with us.

I had a great boss (Dave Williams, now CIO of Just Eat, because he complained that I didn’t namecheck him in the book :P), and he always kept me focused on our capabilities and our achievements and stopped me getting too wrapped up in departmental politics. This strategy worked well. Operations grew in capability and size but every interaction I had with the director of the department went wrong. During crises I didn’t look like I cared enough. My people were belligerent and I was inexperienced. I didn’t know it at the time but we were pushing the envelope of technology and it was never commented on. Had it not been for Dave giving me air-cover and support I would probably have performed some career-limiting- manoeuvre. I certainly came close a couple of times.

It was Dave’s support that gave me the freedom to think about my profession and how things should work that eventually led to my realisation about product-focused DevOps or SRE as it’s more commonly known. Focusing on the needs of the developers gives clear success criteria when they don’t come from the leadership. Skip forward a decade-and-a-bit and I put all these things into practice at Just Eat. This created an environment for the people in SRE where they knew exactly how to be successful. Further it encouraged the people who developed software for the customers to discuss what shared success looks like with us. Some of the best work we did at Just Eat arose from having conversations with developers about what they were struggling with and designing and building solutions to help them out. Few things are more fun that making your friends lives better.

If you are unloved in your technology department and you aren’t getting the support you need from your boss then seek it from your peer group. Meet with your colleagues in development, testing, business analysis and product management work with them to make their lives better, build friendships with them and get your support from shared success.

NEXT GEN DEVOPS A Managers Guide to DevOps and SRE

I’m excited to announce a new edition of Next Gen DevOps. Newly subtitled A Managers guide to DevOps and SRE. This third edition, has been comprehensively restructured and rewritten as a definitive guide for managers.

I wrote the first edition of the book in 2012, at that time, as hard as it is to believe now, many senior technology leaders remained unconvinced about the DevOps paradigm.

I felt this very keenly when I was working on my DevOps Transformation Framework. For any readers that don’t already know it wasn’t my initial intention to write a book at all. I wanted to create a framework for DevOps transformations. As I worked on the framework I realised that many of the assertions I was basing the framework on were unproven to many in the Industry. I ended up creating so much exposition to explain it all that it made more sense to collect it into a book.

I’ve had loads of great feedback in the 7 years that the book has been available. Over that time the main criticism I’ve received is that the book wasn’t instructional enough.

The book wasn’t originally intended to be instructional, that’s what the framework was for. The idea was that if executives could see where the old silos of development, testing and operations were failing and could see a clearly presented alternative thoroughly proven from first principles and supported by anecdotes they could work with their leadership team on their own transformation programme that suited their organisation. That’s where the Next Gen DevOps Transformation Framework would come in to help them to structure that transformation.

That isn’t what happened. Executives were convinced of the rightness of DevOps by a vast weight of evidence from multiple sources. Evidence like Puppet’s State of DevOps reports, various Gartner analyses, pitches from new specialist consultancies like Contino and the DevOps Group (who were eating the older consultancies lunches in the DevOps arena) and recruiters like Linuxrecruit and Esynergy and a huge wealth of success stories of which my small book was merely a drop in the ocean. These executives were then challenging their teams to create transformation programmes. Managers were then looking to my book, and others like The Phoenix Project, to help them figure out what good looked like.

Unfortunately my book wasn’t hitting that spot, until now. This new edition is aimed squarely at those managers looking to migrate development teams to DevOps working practices or operations teams to SRE.

Since I published the second edition I’ve provided strategic leadership to the Department for Work and Pension’s cloud and DevOps migration programme and helped Just Eat improve their resilience and performance and transitioned their operations team to SRE. This new book adds experience from both of those initiatives.

I’ve learned a new technique for creating departmental strategy from business goals which I’ve included in chapter six. I moved the popular history chapter to an appendix as it’s inclusion confused some readers. The point I was trying to make (and if I have to explain it I’ve clearly failed) was that the separate development and operations silos were the anomaly not DevOps. By moving that to an appendix I’ve been able to create a smoother flow from the problem through to the solution and on to a new chapter  about building DevOps teams, which includes a lot of information about hiring engineers. I’ve changed the management theories chapter into a chapter specifically about managing DevOps and SRE teams. Following on from that chapter five details how DevOps teams work with other teams. Each subsequent chapter has been re-focused away from transforming whole departments down to transforming individual teams. I haven’t removed any content so if you are looking for guidance about changing an entire technology department that is all still there. In addition to that there is now a wealth of guidance to help managers transform their teams and supporting their peers as they transform their teams.

If you bought either of the first two editions I’ll send you a free PDF of the new edition, if you email me at grant@nextgendevops.com with a picture of you holding a copy of the book in paperback or showing on your e-reader and give me permission to use those pictures in a future blog post.

Site Reliability Engineering needs product managers

I first heard of Amazon’s Cloud Centre of Excellence (CCOE) concept back in 2016, to say I was cynical would be an understatement. There was a fad, back in the late nineties – early 2000’s to call your team a centre of excellence. As with all such things many of the teams who adopted the moniker were anything but.

Earlier this year, I was at Just Eat, and our Amazon account manager mentioned the CCOE again. We had a good relationship and he fully understood the product-focussed SRE transformation that I was bringing to Just Eat. Again I scoffed at the idea. Just Eat have been in AWS for a while, they have all the fundamentals locked down. Just Eat’s cost management and infrastructure efficiency is the best I’ve ever seen. The security model is sound and well implemented. The deployment tooling is mature and capable and unlike many who migrate to AWS Just Eat are still innovating, they didn’t just stick with the services that were in place when they first migrated. What did we need a CCOE for?

I suspect Amazon have this problem all the time. We’re all busy, we’re trying to satisfy multiple different stakeholders who all have conflicting requirements. We’re trying to grow our team’s capabilities and improve the current solutions while also exploring new capabilities. It can be hard to look up from all that to see the opportunities that are being offered to you.

This is particularly embarrassing for me to admit. I maintain that SRE teams need technical directors because someone needs to be free from the details so they can focus on the longer term strategic goals. Unfortunately I had a bit of detail I couldn’t delegate at the time and I missed the opportunity that Amazon presented me with.

If you haven’t read my book, Next Gen DevOps, you might not know what I’m talking about when I talk about product-focussed SRE.

I maintain that the software we use: Ansible, Chef, Cloudformation, Docker, Dynamo, Kinesis, Kubernetes, Puppet, Packer and all the others are not what’s important. What matters to the organisations we work in are the outcomes we deliver from the products we build with these tools. What matters to developers are that they have reliable, consistently configured environments. What matters to finance teams are that we are demonstrably using infrastructure efficiently. What matters to our engineers is that they can see how what they’ve built contributes to the business’ success and their personal growth.

These are the things that provide value to our customers, the developers, finance people, and security teams. Not the software, languages, configuration and infrastructure that go into making them. These products need vision, roadmaps, budgets, and objectives and key results.

Product-focussed SRE describes an SRE function that recognises that their products exist to server their customers first and foremost and be grounded in real customer needs before we even consider the technology.

Amazon describe the Cloud Centre of Excellence as:

CCOE

(https://docs.aws.amazon.com/whitepapers/latest/cost-optimization-laying-the-foundation/cloud-center-of-excellence.html)

I’ve highlighted the bit that I missed.

I already knew Amazon shared my view about product-focus. It was obvious, EC2 exists because Amazon needed a fast way to provision servers for Amazon.com. Jeff Bezos’ insistence that all teams built their services with APIs made it possible for Amazon to sell EC2 as a service for the rest of us and the cloud was born. However the Cloud Centre of Excellence takes this idea a step forward, they are now recommending that their customers adopt a product-focus too.

This article was co-written with Anthony Fairweather Product Manager for SRE at Just Eat when we were discussing it he gave me this lovely little snippet (that I suspect he wrote for a presentation):

“In the platform team @ Just Eat we’ve embedded a continuous cycle of user engagement (with our engineers) and a prioritisation methodology that ensures we’re always focussing on the outcomes that deliver the most value. Our definition of value is fluid, depending on the business context we are operating in and the risk profile we’re carrying. Think of this like a pendulum that swings between reliability and development velocity.” Anthony Fairweather

Product-focused DevOps mustn’t forget the customer

Synopsis

As an organisation becomes product-focused and teams begin to focus their attention on their suite of microservices the organisation must find a way to ensure there is still appropriate focus on the quality of the service being provided to customers.

The power of a product focused approach

I became convinced of the power of a product-focused approach to DevOps when I was working at EA Playfish. Our games teams were product-focused cross-functional and, to an extent, multi-discipline. They were incredibly successful. They had their problems but they were updating their games weekly with a good cadence between content updates and game changes. They reacted to changing situations very quickly and they even pulled together on programme-wide initiatives well. Playfish’s games teams were my inspiration for wanting to scrap the old development, test and ops silos and create teams aligned behind building and supporting products and services.

The insidious risk of having a product-focus

In Next Gen DevOps I examine the implications of taking a product-focused approach to a DevOps transformation. The approach has since been validated by some high profile success stories that I examine in the second edition of the book. Some more recent experiences have led me to conclude that, while the approach is still the right one I, and several others, have missed a trick.

In breaking down a service into microservices or even starting out by considering the bounded contexts of a problem it’s easy to lose sight of the service from the customer perspective.

Background

One of the functions the Ops teams of the past assumed was maintaining a holistic view of the quality and performance of the service as a whole. It’s this trait more than any other that led to friction with development teams. Development teams were often tasked with making changes to specific aspects of a service. Ops would express their concerns about the impact of the proposed changes on performance or availability. The Developers were then caught between an Ops team expressing performance and reliability risks and a Product Manager pushing for the feature changes they need to improve the service. None of the players had the whole picture, very few people in those teams had much experience of expressing non-functional requirements in a functional context and so the result was conflict. While this conflict was counter-productive it ensured there was always someone concerned with the availability and performance of the service-as-a-whole.

Product-focused DevOps is now the way everyone builds their Technology organisations. There are no Operations teams. Engineers are expected to run the services they build. The most successful organisations hire multi-discipline teams so while everyone codes some people are coding tests, some infrastructure and data structures and some are coding business logic. The microservices movement drove this point home as it’s so much easier to deliver smaller, individual services when teams are aligned to those services.

Considering infrastructure configuration, monitoring, build, test execution, deployment and data lifecycle management as products is a logical extension of the microservices or domain-driven development pattern.

However here there be dragons!

When there’s clear ownership of individual microservices who owns the service-as-a-whole? If the answer to that question is everyone then it’s really no-one.

Key Performance Indicators

In many businesses Key Performance Indicators (KPIs) are defined to determine what success looks like. Common KPIs are sales figures or conversion rates, infrastructure costs as a percentage of revenue, Net Promoter Scores (NPS),  Customer Retention Rate, Net profit etc…

A few years ago the DevOps community got into a discussion about the KPIs that could be used to measure the success of a DevOps transition. As with most such discussions we ended up with some agreement around some common sense measures and a lot of debate about some more esoteric ones.

The ones most people agreed with were:

  • Mean Time To Recovery.
  • Time taken to deploy new features measured from merge.
  • Deployment success (or failure) rate.

Due to the nature of internet debate most of the discussion focussed on what the KPIs should be and very little discussion was had about how the KPIs should be set and managed.

This takes us to the trick I missed when I wrote Next Gen DevOps and the trick many others have missed when they’ve tackled their DevOps transitions.

In our organisations we have a group of people who are already concerned with the whole business and are very focussed on the needs of the customers. These people are used to managing the business with metrics and are comfortable with setting and managing targets. They are the c-level executives. Our COOs are used to managing sales targets and conversation rates, our CFOs are used to managing EBITDA and Net Profit targets. CMOs are used to managing Net Promoter Scores and CTOs are used to managing infrastructure cost targets.

I think we’re making a mistake by not exposing the KPIs and real-time data we have access to so that our executives can actively help us manage risk, productivity and quality of service.

The real power of KPIs

In modern technology organisations we have access to a wealth of data in real-time. We have tools to instantly calculate means and standard deviations from these data points and correlate them to other metrics. We can trend them over time and we can set thresholds and alerts for them.

Every organisation I’ve been in has struggled to manage prioritisation between new feature development and non-functional requirements. The only metric available to most of the executives in those organisations has been availability metrics and page load times if they’re lucky.

Yet it’s fairly logical that if we push new feature development and reduce the time spent on improving the performance of the service service performance will degrade. It’s our job as engineers to identify, record and trend the metrics that expose that degradation. We then need to educate our executives on the meaning of these metrics and give them the levers to manage those metrics accordingly.

Some example KPIs

Let’s get right into the detail to show how executives can help us with even the most complex problems. Technical debt should manifest as a reduction in velocity. If we trend velocity then we can highlight the impact of technical debt as it manifests. If we need to reduce new feature production to resolve technical debt we should be able to demonstrate the impact of that technical debt on velocity and we should be able to see the increase in velocity having resolved the technical debt.

For those organisations still struggling with inflexible infrastructure and software consider the power of Mean time between failure (MTBF). If you’re suffering reliability problems due to under-scale hardware or older software and are struggling to get budget for upgrades MTBF is a powerful metric that can make the point for you.

A common stumbling block for many organisations in the midst of their DevOps transformations is the deployment pipeline. Two words that sum up a wealth of software and configuration complexity. Often building and configuring the deployment pipeline falls to a couple of people who have had some previous experience but no two organisations are quite the same and so there are always some new stumbling blocks. If you trend the Time taken to deploy new features measured from merge. You can easily make the point for getting some help from other people around the organisation to help build a better deployment pipeline.

The trick with all of this is to measure these metrics before you need them so you can demonstrate how they change with investment, prioritisation and other changes.

Implementation

Get your technical leadership team to meet with the executive team, discuss the KPIs that matter to you and some that don’t matter yet that might. Educate everyone in the room about the way the KPIs are measured so the metrics have context and people can have confidence in them. Create a process for managing the KPIs and then start measuring them, in real-time and display them on dashboards. Set up sessions to discuss the inevitable blips and build a partnership to manage the business using the metrics that really matter.

Article image courtesy of: http://maxpixel.freegreatpicture.com/Stock-Finance-Monitor-Desk-Trading-Business-1863880

DevOps Journeys 2.0

Last year Linuxrecruit published an ebook called DevOps Journeys. Over it’s 30 pages various thought leaders and practitioners shared their thoughts and experiences of implementing DevOps in their organisations. It was a great read for anyone outside the movement to understand what it was all about. For people inside the movement it presented an opportunity to learn from the experiences of some of the UK’s foremost DevOps luminaries.

Linuxrecruit have recently published the follow-up: DevOps Journeys 2.0. This one’s even better because I’m in it!

A year on we’re in a different place, most organisations now have DevOps initiatives, we’re in the midst of a critical hiring crisis, new technologies are on the hype train and large companies are jumping on last years band wagons. More companies are now starting to encounter the next generation of problems that arise from taking a product focussed approach to DevOps.
I’ve worked with several of the contributors to DevOps Journeys 2.0 and they are some seriously capable people. If you’re interested in the challenges facing organisations as they embark on or progress along their DevOps Journey’s DevOps Journeys 2.0 is a great read.

Enterprise DevOps Lessons Learned: TDA

I’ve been working at the Department for Work and Pensions (DWP) for the best part of a year now. If you’re not aware the DWP is the largest UK Government department with around 85,000 full time staff augmented by a lot of contractors like myself.

DWP is responsible for:

  • Encouraging people into work and making work pay
  • Tackling the causes of poverty and making social justice a reality
  • Enabling disabled people to fulfil their potential
  • Providing a firm foundation, promoting saving for retirement and ensuring that saving for retirement pays
  • Recognising the importance of family in providing the foundation of every child’s life
  • Controlling costs Improving services to the public by delivering value for money and reducing fraud and error

Taken from A Short Guide to the Department for Work & Pensions published by the National Audit Office June 2015.

The DWP paid more than 22 million customers around £164 billion in benefits and pensions in 2013-14.

After decades of outsourcing it’s Technology development and support the government decided that it should provide it’s own Technology capability. Transforming such a large Technology organisation from, what was primarily, an assurance role to a delivery role Is no mean feat.

Having been a part of this journey for almost a year I thought it might be useful if I shared some of things that have worked well and some of the challenges we haven’t yet overcome.

Today I want to talk about the Technical Design Authority (TDA). I’ve never worked anywhere with a TDA before and I didn’t know what to expect. Established by Greg Stewart CTA / Digital CTO at DWP not long after he joined the DWP the TDA has a dual role.

The TDA hold an advisory meeting where people can introduce new projects or initiatives and discuss them with peers and the Domain Architects. In an organisation as large as the DWP this really helps find people with similar interests and requirements. It reduces the chance of accidental duplication of work and introduces people who are operating in similar spaces. Just finding out who people are who are working in a similar space has been tremedously valuable.

The TDA also hold a governance session where they review project designs. The template they provide for this session is really useful. It forces the architect or developers to consider data types stored, data flows including security boundaries and high-availability and scaling mechanims. That;s not to say every project needs those things but the review ensures that a project that does need them has them.

I can’t list the number of projects I’ve been involved with over the years that would have benefitted from a little forethought about non-functional requirements.

A TDA is a must have for an enterprise DevOps transformation. It makes sure Technology people working on similar projects in different parts of the organisation are aware of and can benefit from each other’s work. It ensures that projects pay adequate attention to the non-functionals as well as the functional requirements and it ensures that where standards are required they are promoted and where experiments are needed they are managed appropriately.

NEXT GEN DEVOPS Second Edition!

NEXT GEN DEVOPS: Creating the DevOps Organisation is getting a second edition!

I’ve been working on it for a while but it’s been my sole focus since I published the NEXT GEN DEVOPS TRANSFORMATION FRAMEWORKPaperback_CoverThe first edition came out around a year ago and a lot has changed since then.

The conversation now seems to be how organisations should approach DevOps rather than whether they should consider it. Friends and I are now talking about dropping the term DevOps because we feel it’s just good software engineering practice. Patterns that I dimly glimpsed two years ago are now clearly defined and have several supporting case studies.

The core theme of the book hasn’t changed. In fact none of the existing content has changed at all. I’ve corrected a few formatting mistakes here and there and I’ve been able to add some great photo’s that I think really bring the history chapter to life. Everyone who’s spoken to me about the book has commented that it’s their favourite chapter and now it’s even better!

All new content!

Paperback_Cover_2nd_edition_smApart from a redesigned cover I’ve added several new chapters the first of which is entitled The only successful DevOps model is product-centric which looks at the four organisations that are most frequently held up as DevOps exemplars Etsy, Netflix, Facebook and Amazon to see what they have in common and what lessons other organisations can learn from their successes and failures. I wrote this chapter to address a comment I’ve heard from several readers that they wanted more explicit instructions about how to transform their teams and organisations.

That’s also the reason I added the next new chapter: The Next Gen DevOps Transformation Framework this chapter provides explicit instructions describing how my DevOps Transformation Framework can be used to transition a business towards DevOps working practices. It impossible to re-format a framework designed to be used interactively on an HD screen to a 6×9″ paperback but I’ve been able to provide some supporting contextual information as well as providing some example implementations. This combined with the Playfish case-studies I’ve published here on the blog should provide people with everything they need to begin their journey to DevOps.

The final bit of new content is an appendix to the history chapter. I learned a lot while I was researching the history chapter, far more than I could include without completing losing the thread of the chapter. What interested me most of all was the enormous role played by women in the development of the IT profession. I’ve worked with some great men and women in my 20 years in IT but I’ve only met two female Operations Engineers. Where are the rest? At Playfish I worked with a lot of female developers but whenever I was hiring I never met any women interested in careers in Operations. I couldn’t shake the thought that something was wrong with this situation. Over the past year I’ve read a lot about the declining numbers of women in IT so I decided to share what I learned while writing the history chapter and do a little research of my own.

Reduced price!

I need to eat some humble pie now. I think I made a mistake when I initially priced the book. When I was writing the book my focus was not on book sales. I know a couple of people who have authored and co-authored books and read numerous articles about how writing will not make you rich so I was under no illusions about my future wealth. I chose the price because I felt that it would lend credibility.

I didn’t factor in that ebooks and self-publishing has changed the market. When I published the book Amazon displayed a little graph demonstrating that $9.99 was a sweet spot for pricing and that I’d make more money publishing at that price. I’m not doing this for the money so what do I care?

That’s where I made a mistake. I don’t care about the money but I do want my message to get out. I think my book is unique because very few authors have spent 17 years operating online services and very few authors had the unique opportunity to work on one of the first examples of continuous delivery.

So the 2nd edition will be priced at $9.99. I understand that people who have paid the higher price for the first edition will quite rightly feel a little put out by this so I intend to publish the second edition as an update to the first this means that those people who bought the book on Kindle can just update their copy to get the second edition.

I can’t update the paperback version and I can’t give them away for free but I do have a plan. I’m going to be publishing a PDF edition of the second edition and selling it through my own online store. I can’t get details of who bought my book so I’m going to do my best to operate an honour system. If you bought a paperback version of Next Gen DevOps and want a PDF copy of the second edition email grant@nextgendevops.com and I’ll send you the PDF version.

While I don’t know who has bought paperbacks I do know how many paperbacks I’ve sold so once I’ve given away that many PDF copies the giveaway will be over so email me asap to ensure you get your free copy.

The second edition will be published in the next couple of weeks and will be accompanied by a formal press-release.