So this is exciting…
We run Graphite at work for (relatively) real-time metric collection of our environment. That’s cool. It’s a powerful tool that allows great insight into what is actually happening in our environment.
Graphite relies (by default) on a flat-file, rrd-like database called whisper. That’s fine when you’re not getting that many metrics (or when you can spring for SSDs to write your data), but when you’re getting over 2.5 million metrics a minute, flat-file structures aren’t the best.
Digging around github, I found a couple of people who were working on a HBase plugin for the graphite backend (here: https://github.com/graphite-project/carbon/pull/216 and here: https://github.com/posix4e/graphite-data). Those two projects, together, theoretically allow us to connect to HBase through Thrift to send and fetch graphite metrics. This means we have distributed storage and we don’t have to have metrics segmented to different boxes in a Graphite cluster.
My co-worker Brandon Bell and I then started fixing all the bugs that crop up when you try and add new code to an existing code base. We ended up fixing some bugs and improving the overall speed of the thrift client. Brandon knows HBase far better than me, and added things like snappy compression to the HBase tables, worked with setting up bloom filters and caching to improve fetch times, added thrift pools (from here: https://github.com/wbolster/happybase), and added batch processing to both the send and fetch functions (which also improved the speed!).
We also backported a number of pull requests from the parent git repos that haven’t been applied yet, plus some other code I’ve added over the last year or so. We’re pretty far off master at this point, so making pull requests of our work is going to be tricky. For now, here’s the two repos that represent our finished work: