Why Twenty Years of DevOps Has Failed to Do It

(honeycomb.io)

39 points | by mooreds 5 hours ago

17 comments

  • mosura 2 hours ago
    It failed because there is an ongoing denial that development and operations are two distinct skillsets.

    If you think 10x devs are unicorns consider how much harder it is to get someone 10x at the intersection of both domains. (Personally I have never met one). You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.

    • tbrownaw 18 minutes ago
      > You are far better off with people that can work together across the bridge, but that requires actual mutual trust and respect, and we’re not able to do that.

      Are you claiming it's fundamentally impossible for people to get along, or just that positive interpersonal relationships can't be reliably forced at scale?

    • vee-kay 1 hour ago
      From someone who has managed both Developmentals team and Operations team for decades.. trust me, they are different beasts and have to be handled/tackled differently.

      Expecting Devs or Ops to do both types of work, is usually asking for trouble, unless the organization is geared up from the ground up for such seamless work. It is more of a corporate problem, rather than a team working style or work expectations & behavior problem.

      The same goes for Agile vs Waterfall. Agile works well if the organization is inherently (or overhauled to be) agile, otherwise it doesn't.

    • ramoz 51 minutes ago
      I mean, look at Kubernetes though. You have to understand both the application and the infrastructure in order to get the deployment right. Especially in any instance of having to pin the runtime to any type of resource (certain disk writing, GPUs, etc).
      • verdverm 13 minutes ago
        That's not a kubernetes specific issue. If you run on VMs or Edge, devs also need to know the resource requirements. If anything, k8s makes that consistent and as easy as setting a config section (assuming you have the observability to know what good values are). The default behavior I've seen is to set reqs w/o lims so you get Sche'd but not OOM'd
      • firesteelrain 49 minutes ago
        My experience has been that devs don’t understand their own app resource requirements
        • ramoz 48 minutes ago
          This would be considered a failure, or are you saying they don't need to?
          • firesteelrain 36 minutes ago
            I am saying that in my experience they get upset when the VM or container they provision blows up because it lacks enough resources or they do not place guardrails on their app and end up getting OOMKilled.
    • tsss 1 hour ago
      You don't need 10x developers. You just need to avoid the 1/10 multiplier of pitting separate development and operations teams against each other.
  • jhawk28 2 hours ago
    DevOps is dead because it's run by a bunch of ops people who don't know how to do dev and a bunch of dev people who don't know how to do ops. The only tooling problem is that a bunch of companies created "DevOps tools" that then get dictated to use: K8s, terraform, etc. The only way this works is if you build the application to fit within those frameworks. Writing an indexer that is massively parallel and is mainly constrained by CPU/Memory. Instead, you have devs building something that gets thrown over the fence to a devops team that then containerizes it and throw it on K8s. What happens if the application requires lots of IOPS or network bandwidth? K8s doesn't schedule applications that way. "Oh you can customize the scheduler to take that into account". 2 years later, it's still not "customized" because they are ops people who don't know how to code. If you do customize it, the API is going to change in a few months which will break when you upgrade.
    • orsorna 1 hour ago
      Would you say it's truly dead or that it fails to meet the performance bar you've described?

      The reality is that most devs do not consider a holistic picture that includes the infrastructure they will be deploying to. In many cases, it's certainly a skill issue; good devs are hard to find. And to flip the coin, it's hard to find good ops people too.

      The reason DevOps continues to linger, however vague a discipline it is, is because it allows the business to differentiate between revenue generating roles and cost center roles. You want your dev resources to prioritize feature work, at the beckon of PMs or upper management, and let your "DevOps" resources to be responsible for actually getting the product deployed.

      In essence, it's a ploy to further commoditize engineering roles, because finding unicorns that understand the picture top-to-bottom is difficult (finding /top/ talent is difficult!). In this way, DevOps is well and alive, as a Romero zombie.

    • BanAntiVaxxers 54 minutes ago
      There are not very many ops people who cannot code. Especially these days. I spent at least the last 20 years doing ops. Ops people are HIGHLY motivated to create things that DON’T FAIL. However, ops teams are often blocked by MANAGERS from doing essentially development in the prod environment. I’m talking about tools and scripts. At the places I’ve worked with the highest uptime, it was because ops had an unlimited, unfettered free hand.

      Remove the handcuffs from your ops team and your reliability will SOAR.

      • verdverm 42 minutes ago
        Average ops have never been less capable and adverse to programming than now. The problem is getting worse, not better. I know because I am in ops and one of the few who loves to code and accidentally entered the field
    • blutoot 2 hours ago
      I would say, in pre-CC (pre-claude-code), this might seem like a daunting task for average DevOps engineers. But in post-CC, there is just no excuse to fret from such challenges.

      EDIT: lol - I am getting downvoted for suggesting some DevOps engineers will actually be ready to take on tasks that were previously more intimidating. I really hope those folks are from the never-coding-agent camp. When I refer to. reliance on CC or Codex, I meant being engaged at a wholesome level with AI -- not blindly one-shotting solutions. This means having the patience to understand the complexity of the system, the criticality of its downtime in the overall architecture (in this case it's the k8s controller), ability to learn the codebase, using the right MCPs to delve into all the details needed for testing changes locally etc). These are system-level skills and barely overlaps with just coding skills.

      • gmane 2 hours ago
        Spoken like someone who has never had to deal with business critical production environments.
      • pryelluw 1 hour ago
        It’s like saying that in a post-Viagra world there shouldn’t be men who have trouble getting laid.
        • blutoot 1 hour ago
          Don't want to get too deep into your analogy. I was addressing the "DevOps cannot code" part. To me it is a leadership failure if a DevOps team is still afraid of tackling bigger challenges (like the example given by the OP). That, of course, depends on whether DevOps teams will exist in the long run.
          • prmoustache 1 hour ago
            The very fact that we are talking about "DevOps" teams (that do not include dev) is wrong from the very start.

            DevOps is a methodology, not a role.

            • blutoot 1 hour ago
              I've always felt that DevOps became a function/team partly because companies and especially SWE's started complaining that they were spending too much time "doing Ops work" and product/business started demanding more features for which they running out of cycles. And add to that the burnout from being on-call (especially if the dev team is relatively small and you have to go on-call every 2-3 weekends).
              • verdverm 47 minutes ago
                When I still did on call ops, devs got notified before us if their apps were the problem. We got notified first if it was our infra

                Having an ops team does not mean devs get to through on call team over the wall to someone else. That's a sure recipe for resentment and turnover

          • verdverm 49 minutes ago
            > the "DevOps cannot code" part. To me it is a leadership failure

            Have you done devops yourself? It sounds like a resounding No. Like you complained ops doesn't like to code (not a core skill for the job), ops complains that devs can't understand basic concepts of how their software runs. Is this also a failure of leadership? Is everyone supposed to know parts of everyone else's jobs?

  • mgilroy 36 minutes ago
    I'd argue that it has failed in some organisations. DevOps for me is embedding the operations with the development team. I still have operations specialist, however, they attend the development team stand ups and help articulate the problems to the developers. They may have separate operations standups and meetings to ensure the operations teams know what they are doing and share best practices. Developers learn about the operations side from those that understand it well and the operations experts learn the limitations and needs of the developers. Occasionally I am fortunate to discover someone's that can understand both areas incredibly well. Either way, this results in increased trust and closer working. You don't care about helping some random person on a ticket from a tream you don't know. You do care about the person you work with daily and understand the problems they have.

    If you can't account for someone spending x% of their time working with a team but for budgetary purposes belonging to a different team then sack your accountants.

    DevOps,like agile, when done correctly should help to create teams that understand complete systems or areas of a business work more efficiently than having stand alone teams. The other part of the puzzle is to include the QA team too to ensure that the impact of full system, performance and integration tests are understood by all and that both everyone understands how their changes impact everything else.

    Having the dev team build code that makes the test and ops teams life easier benefits everyone. Having the ops team provide solutions that support test and dev helps everyone. Having test teams build system that work best with the Dev and ops teams helps everyone.

    Agile development should enable teams to work at a higher level of performance by granting them the agency to make the right decisions at the right time to deliver a better product by building what is needed in the correct timeline.

    DevOps and agile fail where companies try to follow waterfall models whilst claiming agile processes. The goal with all these business and operating models is to improve efficiency. When that isn't happening then either you aren't applying the model correctly or you need to change the model.

  • mjr00 2 hours ago
    > most orgs are used to responding to a daytime alert by calling out, “Who just shipped that change?” assuming that whoever merged the diff surely understands how it works and can fix it post-haste. What happens when nobody wrote the code you just deployed, and nobody really understands it?

    I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.

    • hippo22 2 hours ago
      What happens if the person who wrote the code went on vacation? What happens if the code is many years old and no current team member has touched the code?

      Understanding code you didn't personally write is part of the job.

    • blutoot 2 hours ago
      I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".

      Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).

      Software engineering purists have to get out of some of these religious beliefs

      • verdverm 37 minutes ago
        > Software engineering purists have to get out of some of these religious belief

        To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?

        • throwaway7783 10 minutes ago
          In my own anecdotal experience Claude Code found a bug in production faster than I could. I was the author of the said code, that was written 4 years ago by hand. GPs claim perhaps is not all that unsubstantiated. My role is moving more towards QA/PM nowadays.
    • tarxvf 2 hours ago
      If companies were generally capable of that level of awareness they would not operate the way that they do.
  • bmitch3020 1 hour ago
    DevOps only failed in that so many don't know what it is.

    DevOps isn't a tool, but there are lots of tools that make it easier to implement.

    DevOps isn't how management can eliminate half the org and have one person do two roles, specialization is still valuable.

    DevOps isn't an organization structure, though the wrong org structure can make it fail.

    DevOps is collaboration. It's getting two distinct roles to better interoperate. The dev team that wants to push features fast. And the ops team that wants stability and uptime.

    From the management side, if you aren't focused on building teams that work well together, eliminating conflicts, rewarding the team collectively for features and uptime, and giving them the resources to deliver, that's not a DevOps failure, that's a management failure.

  • verdverm 56 minutes ago
    Yaml is my #1 failure in devops. That so many have resigned themselves to this limit and no longer seek to improve, it's disappointing. Our job is to make things run better and easier, yet so many won't recognize the biggest pains in their own work. Seriously, is text templating an invisibly scoped language really where you think the field has reached maturity?
    • firesteelrain 48 minutes ago
      JSON so much easier in my experience and less prone to error
      • verdverm 45 minutes ago
        JSON does not have comments, no JSON5 is not the answer either

        Think bigger, it's not something you are using today. The next config language should have schemas built in and support for modules/imports so we can do sharing/caring. It should look and feel like config languages and interoperate with all of those that we currently use. It will be a single configuration fabric across the SDLC.

        This exists today for you to try, with CUE

        I've been cooking up something the last few weeks for those interested, CUE + Dagger

        https://github.com/hofstadter-io/hof/tree/_next/examples/env

        • throwaway7783 8 minutes ago
          Like XML? :)
        • lijok 38 minutes ago
          Like Python?
        • firesteelrain 35 minutes ago
          I genuinely despise the identing requirements of YAML.

          For comments, I use a _comment field for my custom JSON reading apps

          • verdverm 19 minutes ago
            yeah, this is what I'm talking about, innovation has stopped and we do dirty hacks like `imports: [...]` in yaml and `_comment` in json

            How are people not embarrassed by this complete lack of quality in their work?

  • anonymars 1 hour ago
    Am I the only one who remembers when DevOps meant "developers are responsible for dealing with the operational part of their software too, so that they don't just throw stuff over the wall for another team to deal with the 3AM pages"?

    It seems to have become: "we turned ops into coding too, so now the ops team needs to be good at software engineering"

    • vee-kay 1 hour ago
      DevOps was (and is) merely an excuse for companies to replace Developers with cheaper Ops resources, and yet expecting better services and better products from them.

      My personal experience says that the best way is that Ops team shouldn not be repurposed as Developers, rather put the experienced Developers into Production Support (incident management, that's intense Ops, working in shifts and weekends, etc.). And rotate them whenever needed. Over a period of time, you'll invariably see less defects and issues percolating down from the Devs, and then after both sides are stable and working well together with less friction and open tickets, then some more tech savvy Ops members can be rotated into Development teams as rookie devs to help reduce costs a bit (as there'll invariably be some natural attrition among the Devs and Ops, so this gives an alternative career path to the Ops team (who are usually less paid, and more stressed), and pushes the Devs not to become complacent). Such an approach is doable and productive.

      • Uvix 28 minutes ago
        We tried this, but we just got more defects, because the Devs lost what little Ops knowledge they had. Where previously Ops would have to involve Devs, now that Production Support has some Dev knowledge, suddenly they get the blame for everything. Devs no longer have interest in things like "reading log files"; they just ship any problems over to Production Support.
        • verdverm 8 minutes ago
          You can find examples that go both ways for both endeavors, anecdata...

          The problem in your case is not the dev vs ops split, it's a company culture thing which I'm sure you see play out in more places than this current focus

    • verdverm 11 minutes ago
      That was an ambition of devops at one point, it has not born the fruit it promised. Dev teams are not positioned to do ops well. We have specializations for a reason
    • prmoustache 1 hour ago
      I am with you.

      DevOps is a methodology. DevOps as a role or team name is a fantasy from people who do not understand the methodology.

      If you want DevOps to work, your Ops must be member of the development team, take part in the sprints, etc. But many company do not want to do that because they want to separate ops and dev budget/accounting and do not want to hire enough people with ops skills.

      • verdverm 7 minutes ago
        This is not true, you can make it work well either way. It's about people and processes, not about some specific setup or way of grouping people
  • skybrian 1 hour ago
    I don't understand these graphs. Why do the lines go back in time?
  • politelemon 2 hours ago
    If your developers weren't looking at dashboards before, they won't use a chat interface to interrogate it either. That doesn't really bring it to them any more than their existing capabilities. There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.
    • verdverm 5 minutes ago
      My underlying assumption is that this is a content marketing piece to show managers / investors that "we are doing/thinking something in ai as a company"
    • amtamt 2 hours ago
      > There's also a worrying underlying assumption being made here that the answers your LLM will give you are accurate and trustworthy.

      I first hand saw in, AWS devDays, an AI giving SIWINCH as "root-cause" of Apache error in a containerized process is in EKS for a backend FCGI process connection error. It has been extremely hard since that demo to trust any AI for system level debugging.

      • verdverm 4 minutes ago
        (1) when was that? If it was less < 6months ago, the current gen of models is noticeably better

        (2) AWS is not a leader, if even a contender, in the AI space. I would not evaluate the potential based on a demo they produced

      • temp0826 2 hours ago
        If we were smart we'd use AI to grok a system in order to help us reduce its complexity. I don't think we're anywhere close to even being able to provide all the necessary context to solve problems like this.
  • jbreckmckye 1 hour ago
    Because the idea you can have all aspects of maintaining a complex piece of technology, maintained by a single cross-skilled team of interchangeable cogs, is utopian and unworkable past any reasonable level of scale

    DevOps, shift left, full stack dev, all reminds me of the Futurama episode where Hermes Conrad successfully reorgs the slave camp he's sent to, so that all physical labour is done by a single Australian man

    Speaking darker, there is a kind of - well, perhaps not misanthropy, but certainly a not-so-well-meaning dismissiveness, to the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple, I don't know why you're making all these siloes, man" - assuming that ops specialists, sysadmins, programmers, DBAs, frontend devs, mobile devs, data engineers and testers have just invented the breadth and depth and subtleties of their entire fields, only as a way of keeping everybody else out

    But modern systems are complex, they are only getting more so, and the further you buy into the shift-left everyone-is-everything computer-jobs-are-all-the-same philosophy the harder and harder it will get to find employees who can straddle the exhausting range of knowledge to master

    • lll-o-lll 40 minutes ago
      > the "silo breaking" philosophy that looks at complex fields and says "well these should all just be lumped together as one thing, the important stuff is simple,

      I don’t think this is the right take. “Silo’s” is an ill-defined term, but let’s look at a couple of the negative aspects. “Lack of communication”, and “Lack of shared understanding” (or different models of the world). I’m going to use a different industry example, as I think it helps think about the problem more abstractly.

      In the world of biomedical engineering, the types of products you are making require the expertise of two very different groups of people. Engineers and Doctors. A member of either of these groups have an in-group language, and there is an inherent power differential between them. Doctors are more “important” than engineers. But to get anything made, you need the expertise of both.

      One way to handle this is to keep the engineers and doctors separate and to communicate primarily via documents. The doctor will attempt to detail exactly how a certain component should work. The engineer will attempt to detail the constraints and request clarifications.

      The problem with this approach is that the engineer cannot speak “doctorese” nor can the doctor speak “engineerese”; and the consequence is a model in each person’s head that differs significantly from the other. There is no shared model; and the real world product suffers as a result.

      The alternative is to attempt to “break the silos”; force the engineers and doctors to sit with each other, learn each other’s language, and build a shared mental model of what is being created. This creates a far better product; one that is much closer to the “physical reality” it must inhabit.

      The same is true across all kinds of business groups. If different groups of people are required to collaborate, in order to do something, those people are well served by learning each other’s languages and building a shared mental model. That’s what breaking silos is about. It is not “everyone is the same”, it’s “breaking down the communication barriers”.

      • jbreckmckye 29 minutes ago
        I don't think that's like DevOps, though. A closer analogy would be a business that only hired EngDocs, doctors who had to be accredited engineers as well as vascular surgeons.

        I don't think anyone thinks siloes are themselves a good thing, but they might be a necessary consequence of having specialists. Shift-left is mostly designed to reduce conversations between groups, by having individuals straddle across tasks. It's actually kind of anti-collaboration, or at least pessimistic that collaboration can happen

        • lll-o-lll 21 minutes ago
          Oh, I completely agree! We created “EngDocs”, as you say, and simply made the situation worse. An EngDoc is an obviously ludicrous concept, on its face. But by breaking down the silo in the biomedical example, each engineer becomes a bit knowledgeable about an aspect of medicine and each doctor gains some knowledge about aspects of engineering.

          I am arguing that all such people, whether developers or ops or ux designers or product managers; need to engage in this learning as they collaborate. This doesn’t mean that we want the DevPM as a resultant title, just that Siloing these different groups will lead to perverse outcomes.

          Dev and ops have been traditionally siloed. DevOps was a silly attempt to address it.

  • blutoot 1 hour ago
    My message to the CTO of Honeycomb.io (who apparently wrote this post): please avoid getting philosophical and controversial to gin up curiosity about your AI platform. If you want to highlight the benefits of your platform then do so earnestly and objectively. Please don't mask marketing with an excoriation of a profession that has never been well-defined (or has always been defined to fit into an organization's political landscape for the most part). And you guys (like every other SRE/Ops platform) capitalized on that structural divide and deservedly got rich by selling licenses to these teams. I don't think you can come in now with this holier-than-thou best practice messaging just because platforms like yours have zero moat in this post-CC/Codex world.

    Hence my vitriol: https://news.ycombinator.com/item?id=46662287.

    • TacticalCoder 1 hour ago
      > id getting philosophical and controversial to gin up curiosity about your AI platform

      Also: please could he please avoid doing it by illustrating his non-sense with graphs that are both childish and non-sensical?

      • maccard 1 hour ago
        The CTO is a she.
  • zug_zug 1 hour ago
    "I think the entire DevOps movement was a mighty, ... it failed."

    I'm so sick of this nonsense. "Devops" isn't failing, isn't an issue, you can rename it whatever you want, but throughout my career the devops engineers (the ones you don't skimp on) are the best, highest paid professionals at the company.

    I don't know why I keep reading these completely crazy think-pieces hemming and hawing about a system (having a few engineers who master performance/backups/deployments/oncall/retros) that seems to be wildly successful. It would be nice if more engineers understood under-the-hood, but most companies choose not to exclusively hire at that caliber.

  • GiorgioG 1 hour ago
    In my experience DevOps has little interest in doing actual DevOps - they just want to run ops. They want to advise (or tell us we’re holding it wrong) but not actually get their hands dirty. On the flip side, devs don’t want to spend a ton of time learning k8s or how to manage servers, cloud services, etc.

    DevOps is a mess of our own making - embracing K8s created complexity for little gain for nearly all companies.

  • bravetraveler 1 hour ago
    Scratching neck: come on... just one more vendor, bro
  • gardenhedge 2 hours ago
    In my company, instead of relying on an ops team.. we rely on a devops team.
  • blutoot 2 hours ago
    I can't wait for indie developers to build super-agents that commoditize providers like Honeycomb.io and more importantly clone all their features and offer them up for free as OSS.
    • verdverm 1 minute ago
      Sounds like you don't know what a nightmare of version compat and bespokeness ops/obv is. This is going to be one of the harder things for LLMs to do because everyone is running on some snowflake held together with duct tape
  • alphazard 1 hour ago
    DevOps only works when the developers are always right. What usually happens is the DevOps team thinks they know best (they are developers too, just not the ones using the tools), and they build a lot of garbage that no one wants to use, often making things more complicated than they were before.

    Eventually a bureaucrat becomes the manager of the team, and seeks to expand the set of things under DevOps' control. This makes the team a single point of failure for more and more things, while driving more and more developer processes towards mediocrity. Velocity slows, while the DevOps bottlenecks are used as a reason to hire.

    It's an organizational problem, not a talent or knowledge problem. Allowing a group to hire and grow within an organization, which is not directly accountable for the success of the other parts of the organization that it was intended to support, is creating a cancer, definitionally.