Centralized Reporting with ELK Stack

There are a lot of third party services out there that provide you with analytics on your app.  This is just about always the best instant solution, but eventually you’re always limited by whatever features they do or don’t provide.  A custom reporting system (or at least a collection of hacks) is always necessary after a certain point and usually gets pretty ugly.  If you have a solid enough dev team, the best solution is to roll your own central reporting using the ELK (ElasticSearch/Logstash/Kibana) stack from the get-go!

Elasticsearch (ES) is hugely scalable so you can store the finest grained events combined with the most flexible query API of any database or search engine.  This means as long as you stored the raw events that occurred, you can generate virtually any report from it with ES’s nested aggregations retroactively.  I must have written hundreds of tiny Node.js scripts that generate every report we’ve ever wanted to see.  This simply isn’t possible with any other open source tech I know of.  And the schema? Make it up as you go, no redeploys when adding new fields or events.  If you’re at all a specialized business, eventually you’ll need custom reports based on custom schema.  No proprietary service I know of can provide this level of flexible event reporting, so bite the bullet now and figure out how to script against ES.

Debugging Distributed Systems

No matter how great your unit testing intuition, there’s always that one rare bug on production that just can’t be figured out with TDD.  If I had to manually dig through each of our 11 servers’ log files to figure out what happened, I’d be toast.  I added special metadata to related event logs and pretty quickly figured out that the bug was exclusive to the user connecting to 2 different servers sequentially.  Not the kind of issue you often anticipate!  Suffice it to say that in the cloud, central logging is crucial.

Understanding your Users

Here’s a biz-specific dashboard I put together for an ecommerce client.  Each visualization took on average about 2 minutes to create in Kibana, then I glued them all together into a Realtime KPI Dashboard.  I made similar dashboards to compare A/B tests, sniff out snooping competitors and compare marketing campaign results.  Even if you don’t have big data to aggregate, the click-level scoping of events lets you learn a lot about users after just a few visits.  See that search bar?  You might have to train UX or marketing up on Lucene queries and your biz’s custom field schema for maximum effect.

Platform for Data Science!

Recently, we wrote a query in ES to dump certain events to a CSV that then acted as input for logistic regression!  With this we built a small Python webapp that could predict the chance of a user sticking around to week 2 given how much they interacted with certain features.

Leave a Reply