Profile picture
Clint Sharp @clintsharp
, 13 tweets, 5 min read Read on Twitter
When determining the health of your system storing and querying Terabytes or Petabytes of data is wasteful. A sample is totally sufficient. @dritan walks you through how to do this with @cribl_io while keeping the full fidelity in cheap storage. blog.cribl.io/2018/10/24/rou…
Thread 1/?
The advent of cheap storage and distributed systems like @Splunk, @elastic or Hadoop have led to an approach whereby which we defer decision making on the value of data until later. This assumes the costs of the lazy approach to data management is affordable.
However, I hear over and over again from our prospects how expensive their systems for analyzing machine data are becoming. It's not just software license costs, I have personally spoken to prospects spending $1m+/yr in infrastructure alone to query and analyze logs.
In bygone years, due to system capacity and scalability issues, this decision making was put up front. This made projects to build data warehouses slow and brittle while they were forced to plan up front all the data they wanted to bring in.
The shift to lazy data management was definitely a pendulum swing the opposite direction. "Now we can just on-board everything!" In the case of business critical data, like application and transaction logs, this totally makes sense. That data is likely to be valuable.
This approach to high volume low value data however is disastrous from a cost perspective. Onboarding terabytes a day of data which is rarely queried is wasteful. In many cases, this data is kept due to concerns about security and compliance. It's an insurance policy.
This requires a shift in thinking. We are moving towards a multi-system future. Log/Machine Data is not going to be stored simply in one place. Customers are already opting to lay this data to rest in multiple systems: @Splunk, @Elastic, Hadoop, S3, NFS Filers, etc.
Data which is being kept for compliance and never read SHOULD NOT GO TO AN EXPENSIVE STORE. Put that data somewhere cheap, like S3. If you're breached, it's trivial to suck that data in for analysis. See how AWS does this at massive scale with @Splunk: static.rainfocus.com/splunk/splunkc…
Even for hunting, I'd posit that if you're smart about what data you want to onboard, say bringing in flow logs from trusted to untrusted zones and sampling trusted to trusted, you can get all the data coverage you need for a fraction of the cost.
This is why we call Cribl Real-Time Log Management. We believe you should be able to decide, programmatically, where data is best stored and do that as the data is moving. In the future, we will even be able to do this responsively.
Let's say you have an alert that gets sent to @TryPhantom or @VictorOps. Your hunters and troubleshooters need more data. No problem, based on the data in that alert, Cribl can go suck data out of your archive and get it back into @Splunk before the user even logs in.
The next generation of scale is going to require us to be smarter about data management. We can't afford to just lazily on-board everything. We need better tooling to help us manage our data cost effectively and put it in the best possible destination for performance and cost.
Cribl is doing this today for our customers. If this sounds interesting to you, we'd love to talk to you! Grab the bits at cribl.io/download.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Clint Sharp
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!