AWS, Blog Archive, Open Data, Public Data Sets, Streamer, Uncategorized, Use Case / 8 November 2016 / Dorothy

Essentia, Elasticsearch, and Kibana

You’ve probably heard of them, but if you haven’t: Elasticsearch and Kibana are data management products created by Elastic. Elasticsearch is a search engine based on Lucene, and Kibana provides a user interface for exploring and visualizing data. Both are part of the Elastic Stack, a collection of open source tools for collecting and analyzing data that has enjoyed a rise in popularity in recent years and now boasts a number of notable users, including Facebook and Netflix.

Recently, I’ve spent some time fiddling around with open data sets in Elasticsearch and Kibana. I’ve found that they work well with Essentia, creating a seamless workflow for data organization and analysis. Continue reading to see how it all comes together.

The process:

  1. Categorize and preprocess your data in Essentia.
  2. Stream your data into an Elasticsearch index.
  3. In Kibana, configure an index pattern that captures your index (or indices).
  4. Explore your data: see how many hits there are in a time span, or which values are the most frequent. Get a feel for what your data looks like and what you want to do with it.
  5. Visualize your data by creating a tile map, bar chart, line chart: whatever fits the bill. You can select the fields you want to look at, as well as any averages or aggregations you’re looking for, and limit your selection by number of records or time span.
  6. Optional, but useful: Kibana offers Dashboards, where you can arrange different visualizations together for comparison or overview.

Let’s look at some examples:

tvs

Data source: National Centers for Environmental Information (NCEI) – Asheville NC

NOAA Severe Weather database:

These are records from NOAA’s Severe Weather Events database, specifically Tornado Vertex Signatures, which indicate increased likelihood of tornadoes. I wrote more about this data set in my last blog post, where I cleaned the raw data in Essentia and then displayed it in Google Earth. That process was a lengthier one: I streamed the data from Essentia into a CSV, then used an online tool to generate a KML file for use in Google Earth, which I then downloaded and installed. Kibana’s tile mapping function makes things easier. I created my Essentia category and Elasticsearch index in the same script, and converted the latitude and longitude coordinates into geopoints. Then, after doing some exploration in the “Discover” tab, I created the visualization above.

Size of adult population (ages 18-64) in each state receiving adult disability benefits from the Social Security Administration from 2001-2015

Size of adult population in California (ages 18-64) (right) and percent receiving adult disability benefits (left) from the Social Security Administration from 2001-2015

Social Security Administration disability claims database:

This is another familiar public data set: here, I used a Dashboard to place my two visualizations side by side. It’s interesting, to say the least. We can see a fairly similar upwards trend in both, with a slight spike around 2005 in the percent of adults receiving disability benefits. Kibana also makes it easy to search your index: for example, in creating the charts above, I originally had overall information for the United States. I searched “state:California” in the search bar, and Kibana instantly adjusted to reflect California data, which is what you see here.

Border crossings to/from Mexico during 1995-2007

Border crossings to/from Mexico during 1995-2007

US Transportation Border Crossings:

This data comes from an AWS Public Data Set (read more about it here). I created an Essentia category, streamed the contents into an Elasticsearch index, and pulled up Kibana. Shown here is a visualization I created showing the number of border crossings to/from Mexico from 1995-2007.

 

Overall, I really enjoyed using Elasticsearch and Kibana in conjunction with Essentia. Essentia is perfect for cleaning and categorizing raw data, and Essentia’s output can be directly streamed into an Elasticsearch index, making for an easy transition from organization to analysis. Kibana plays well with all kinds of data, providing auto-analysis results for easy exploration. As is often the case with open data, you don’t really know what it looks like until you begin to analyze it. The “Discover” feature in Kibana helps by providing a list of available fields, an overview of popular values for each field, allows for filtering by time period, and is generally extremely convenient for exploring data before you know what you want to do with it. “Visualize” is very user-friendly as well, with an aesthetically pleasing and intuitive interface that allows for numerous types of analysis. As we start to accumulate more data than we know what to do with, it’s important that data tools are flexible and powerful, able to smoothly integrate into different workflows. Essentia, Elasticsearch, and Kibana all play well together: try it out!

Sample scripts and more information can be found on git repository.

Japan