An edited version of the following post appeared initially on Diginomica.com on July 29, 2019.
Observability is an increasingly important concept for enterprise technology teams. That’s because even as new technologies and approaches—cloud, DevOps, microservices certification, containers, serverless, and more—increase velocity and reduce friction getting from code to production, these innovations also introduce complex new challenges. “It’s amazing how much stuff has to work perfectly for things to work in production,” New Relic Founder and CEO Lew Cirne told the audience at our FutureStack18 conference last September.
Don’t miss: FutureStack18—Lew Cirne on the 4 Pillars of an Observability Platform [Video]
True observability—not merely tactical monitoring—is key to mastering this complexity and fully understanding what’s happening in your company’s software and systems. But what does that mean in the real world? How do we define a modern observability platform?
We think it’s a good idea to start with 10 core principles:
The 10 principles of observability
1. Curation vs. participation
A modern observability platform excels at curation: cutting complexity down to size, and selecting and presenting relevant insights for users. But such a platform should also support participation—for example, making it easy for users to work with custom metrics and data sources.
Curation and participation are equally important in a modern observability platform. Curation gives teams a critical productivity and efficiency edge: the smaller the haystack, the easier it gets to find the needle. (New Relic customers might recognize our distributed tracing anomaly detection or Kubernetes cluster explorer as examples of how curation helps to achieve observability.)
Participation, on the other hand, puts a premium on versatility—capturing and manipulating data in valuable ways, even when the platform doesn’t know how to shape or present that data. Participation also relies on programmability: giving users the tools, and especially the APIs, to help them help themselves.
2. Support power users
Power users are an important segment of any product’s user base. These are the users most likely to access—and to appreciate—the deeper capabilities that set a product apart from its competitors. And power users are often a product’s most respected and effective champions.
When it comes to application monitoring and observability, power users tend to have very tough and demanding jobs; many of them, for example, practically live in their integrated development environments (IDEs). These users want to automate everything, and they stand to benefit the most from a programmable and extensible observability platform. The New Relic platform, for example, addresses this goal via APIs that allow power users to consume data (such as creating custom metrics,) in addition to injecting data for the New Relic platform to use.
3. Applications rule
When we speak with New Relic customers, many of them deliver a similar message: “What matters to us is whether our application is healthy or not.” And when an application experiences problems, customers want to pinpoint he source of the issue as quickly and accurately as possible.
The lesson we learned from these customers is loud and clear: An observability platform is most valuable when it focuses on measuring application performance and on surfacing application-performance roadblocks.
4. Embracing change
The pace of change in the observability space is breathtaking, and observability solutions must make tough decisions about capabilities and priorities. The plans and features that made sense six-months ago may no longer be relevant, and while product roadmaps remain important, observability solutions must adapt readily to the realities of fast-moving technology innovation.
5. Full transparency
Sometimes observability requires a comprehensive, high-level view of application performance. Other times, it’s all about drilling down into very granular details—with no surprises, and full context.
A good observability platform delivers both of these capabilities. It also provides a consistent, intuitive, and transparent path for moving between high-level and lower-level views.
For example, let’s say that you’re looking at a summary view of performance in a time-series chart. You notice a spike in errors, and you want to know more about what’s happening. You should be able to drill down from that summary view into the underlying data—to view unhandled exceptions, perhaps, or even to view the stack frame or lines of code that introduced the error.
Just as important, such a view should show the useful metrics you expect to see, along with the context required to understand what’re really going on. This type of transparency is especially important in high-stress, high-urgency situations where dev and ops teams want to focus on fixing the problem—not on finding…