- 5-700K data points per sec
- avg of 200K data points per second (terabytes per day)
- why an OpenTSDB approach failed (DDOS attack mitigation required more granular telemetry and per packet inspection)
- why ELK made sense
The used the beats ecosystem (beats is the oss project from eBay, iirc)
On where ELK shines over standard time series databases. #velocityconf
- downsampling
- monotonous counters and operations related to counters
#velocityconf
- high cardinality is a requirement
- need a system that combines the best of metrics and logs
- operational simplicity and community
- ELK has a steep learning curve which discouraged NS1 at first, but community support helped overcome these barriers
Also - kudos to the speaker for keeping the y’all buzzword free, and using terms like “Operations engineer” and “monitoring” 👏
Why did NS1 pick Kibana over grafana?
A: the devs (backend engineers) like grafana better and that’s what they use. The operations team require the ability to make certain Elasticsearch queries which grafana doesn’t support, so that team uses Kibana”
A: NS1 needed clustering support which isn’t available in the OSS InfluxDB.