Datadog, Open Telemetry, & A History of Observability

Principal Developer Advocate at Honeycomb and Former SRE Leader at Google

Print

Executive Bio

Liz Fong-Jones

Principal Developer Advocate at Honeycomb and Former SRE Leader at Google

Liz is a developer advocate at Honeycomb, a SaaS observability company that competes with Datadog, New Relic, etc. Liz previously spent over 11 years at Google Cloud as a reliability engineer and leading developer advocate across SRE, devops, and infraops.Read more

View Profile Page

Interview Transcript

Disclaimer: This interview is for informational purposes only and should not be relied upon as a basis for investment decisions. In Practise is an independent publisher, and all opinions expressed by guests are their own opinions, and do not reflect the opinion of In Practise.

Observability, it’s a big word. Can you define observability, or at least, how you would define it.

The way I define observability is that it is a capability of a team to interact with its systems; to understand their systems in real time, on questions that you didn't anticipate you will need to ask until you ask them. That's the theory of what observability is. Then in practice, what we use to accomplish observability, is a combination of telemetry signals, like tracing or logging, or metrics for profiling, in combination with a storage engine to query them, as well as the ability of humans to interact with that system.

That’s how we achieve observability in practice. It is a combination of things, where people believe that observability is about the signals. The signals are important, but it's what you do with them that matters.

Taking a step back, how did this all work when everything was on-prem?

I think it's hard to disentangle the history of how we got here, to what some practices are, because today you can achieve observability regardless of where applications are located. What matters is instrumenting your applications so that they produce the relevant telemetry. There's no difference today between the observability that I would do in the cloud versus on-prem, with the minor exception of what host-level metrics you keep to help you out if there is a problem with an individual host.

In the olden days, we struggled to achieve observability by doing log searches. You would scrape logs from individual hosts, and that you would try to make sense of what was happening with the tools you had at the time. For on-prem 10 years ago, that was logging.

Were all these different products that the big players offer, all separate? Did you have to run them separately and then try to find the problem?

Yes. The way we got to where we are today with the market is that you have all the legacy players coming from their individual positions of strength. That Splunk started as a logging-first company, and indeed, they were innovative when they introduced the idea that you don’t need SSH individual hosts, and you don't need to grep through the logs, that Splunk will collect them and centrally index them. That was revolutionary at the time, and since then, they have kept themselves relevant by saying they are now on board with this observability thing, and they are now contributing to open-source projects and developing an APM tool for the combination of building and buying. They acquired Omnition.

Basically, Splunk came from the logs world, Datadog came from the metrics world, and then you have providers like Dynatrace or AppDynamics coming from the APM world, all of which are different, interesting routes to arrive at trying to develop an observability solution for modern systems, as we understand them today.

Does it matter which product, or where you come from, in determining who has an advantage today?

My view, and our view at Honeycomb certainly – we have our biases – is that starting a clean slate, and thinking about all the various signal types as all being categorizable as just special cases of wide events, of key-value pairs that you index as events, that is the universal approach, that doesn’t require you to have a bias towards one or other of the signal types; they all end up being wide events.

In contrast, I feel like if your company's bread and butter is logs, or your company's bread and butter is metrics, it can hinder you in some respects, if there was the danger of cannibalizing your previous business. Where, if the new modern thing is not high-volume log indexing, is the perspective of having then high-volume log indexing biased the way your company thinks about the problem?

I don’t know. I think it's an interesting set of circumstances. I think it’s an interesting thing for people to explore and look at. At the end of the day, people should choose vendors based on what capabilities they offer and how their software developers can interact with the system. So, it may be that in circumstances where you're used to using a logging solution and that logging solution pivots to observability, maybe that’s the best thing for you.

Going back to that earlier question about approaches, I think there is lots of interesting stuff to learn from companies, like my employer Honeycomb, or from our competitors like Lightstep, like Aspecto, these newer generations of tools that are not biased by having a legacy install base of a very traditional product.

Sign up to read the full interview and hundreds more.

Audio

Datadog, Open Telemetry, & A History of Observability(October 8, 2022)

00:00
00:00
Sign up to listen to the full interview and hundreds more.

FORUM

Company Channels

PARTNER

Speak to Executive

Join waiting list for IP Partner
Did you like this article ?