You’ve probably heard of them, but if you haven’t: Elasticsearch and Kibana are data management products created by Elastic. Elasticsearch is a search engine based on Lucene, and Kibana provides a user interface for exploring and visualizing data. Both are part of the Elastic Stack, a collection of open source tools for collecting and analyzing data that has enjoyed a rise in popularity in recent years and now boasts a number of notable users, including Facebook and Netflix.
Recently, I’ve spent some time fiddling around with open data sets in Elasticsearch and Kibana. I’ve found that they work well with Essentia, creating a seamless workflow for data organization and analysis. Continue reading to see how it all comes together.
Essentia on AWS Marketplace
There’s a new version of Essentia available on the AWS Marketplace! The upgrade from version 2.0.21 to 2.1.7 includes a few key changes.
First, Essentia now works on HVM instances instead of PV instances. This allows users to take advantage of Amazon’s newer generation of instance types, and makes security management much easier. Second, the new version of Essentia provides many new features including the ability to stream, clean, and move massive amounts of data from S3 data stores directly into a Redshift data warehouse. This streaming capability works effortlessly, regardless of compression type, and without the need to generate intermediate files or write complicated code.
The AWS community site contains a free version of the AMI that limits use to a single instance at a time.
When it comes to big data, compression is key.
However, many popular analysis tools can’t handle Zip compressed files. Each Zip file must be converted to another compression type (such as Gzip) before being analyzed by these tools. This accrues a high cost due to the need to store these extra files as well as the large amount of time it takes to carry out the conversion. That is, if you’re not using Essentia.
The parts of Essentia that dramatically improve on this are its native support of Zip and Gzip compression as well as its ability to streaming unzip Zip files and then compress them into Gzip format.Thus you can select exactly the Zip files you need from wherever your data is stored using the Essentia Scanner, streaming convert them into Gzip format, and then output them wherever you want. They can be sent directly into Redshift or other analysis tools, saved to file, or sent to S3 for later loading.