You’ve probably heard of them, but if you haven’t: Elasticsearch and Kibana are data management products created by Elastic. Elasticsearch is a search engine based on Lucene, and Kibana provides a user interface for exploring and visualizing data. Both are part of the Elastic Stack, a collection of open source tools for collecting and analyzing data that has enjoyed a rise in popularity in recent years and now boasts a number of notable users, including Facebook and Netflix.
Recently, I’ve spent some time fiddling around with open data sets in Elasticsearch and Kibana. I’ve found that they work well with Essentia, creating a seamless workflow for data organization and analysis. Continue reading to see how it all comes together.
A few weeks ago, I wrote about the recent rise in popularity of open data, and how these public data sets can be easily processed with Essentia. All the examples in that post are based on Amazon’s AWS Public Data Sets, which are (for the most part) large databases put together by organizations for public access and use. However, because the AWS data sets are voluntarily published by each organization, many are not regularly updated. In the US Transportation database available on AWS
, aviation records and statistics are provided from 1988 to 2008. More recent data (through April 2016) can be found on the US Department of Transportation’s website
, but in a format different from that of the data provided on AWS. Other open data are not prepackaged at all: for example, the US Census Bureau has information on state tax collections from 1992 to 2014
, but on the website, recent data is separated from historical data, and from there, visitors can only view data for one year at a time. Furthermore, while tables for recent years can be downloaded as CSV or Excel workbooks, older tables are only available as Excel files. How do these issues affect people seeking to work with open data? Added to the complexity of processing large amounts of data is the challenge of first collecting all of the available files, then processing each one (separately, if they come in different file types and data formats) before putting everything together. Read on to see how Essentia rises to the occasion.
Open data as a term is relatively new, but the concept is not. The idea that information should be freely available for unrestricted use has been around for awhile, but didn’t really take off before the rise of the Internet made it feasible to share data quickly and globally. Add in the recent popularity of big data, and it makes a lot of sense that public datasets are on the rise as well. Enormous amounts of valuable data, including everything from climate projections to genome sequences, have been made available by the organizations that own them and are now free for download: on Amazon Public Data Sets, Data.gov
, and more. The possibilities are endless, as researchers, businesses, and citizens from around the world have access to data that would otherwise be extremely expensive and time-consuming to collect. The challenge that follows is how these researchers and businesses are going to handle these large datasets, including storage, organization, and analysis.
According to my mobile device I took about 5,300 steps yesterday, which was slightly more than the day before. While this is interesting for me to track my activity levels, I doubt most other people would care, except perhaps my doctor or health insurance provider. But this does put me in a growing group that is potentially leading the way to a revolution in healthcare.
In a recent Washington Post article, currently one in 10 American adults own fitness tracking devices. In addition there has been an “explosion in extreme tracking” where more and more people use them not just as step counters, but also to monitor heart rate, stress, sleep cycles, and just about every other possible measurable biometric via wearable technology. In fact, we have a couple of people who fit this description here in our own offices. This is creating tons of data – big data – for doctors and insurance carriers to synthesize.
A common “big-data” workflow requires reducing a vast amount of complex data into a form that can be easily visualized. With AuriQ Essentia, your data from the cloud can be efficiently parsed, cleaned, and reduced directly into Tableau Extract files (.tde
) to generate interactive and compelling visualizations.
Take, for example, this Tableau Dashboard created from web server logs processed with Essentia. This visualization presents data from individuals visiting specific pages and contains two dashboards: “Visitor Analysis” and “Pages by Time”. In the “Visitor Analysis” tab, you can see what browsers were being run, how long visits lasted, and what operating systems were used. In the “Pages by Time” dashboard, you can analyze trends over time by day, day of week, and hour.