AqTools, aq_pp, AWS, Blog Archive, Open Data, Public Data Sets, Use Case / 15 7月, 2016

Open Data and Essentia (Part 2)

A few weeks ago, I wrote about the recent rise in popularity of open data, and how these public data sets can be easily processed with Essentia. All the examples in that post are based on Amazon’s AWS Public Data Sets, which are (for the most part) large databases put together by organizations for public access and use. However, because the AWS data sets are voluntarily published by each organization, many are not regularly updated. In the US Transportation database available on AWS, aviation records and statistics are provided from 1988 to 2008. More recent data (through April 2016) can be found on the US Department of Transportation’s website, but in a format different from that of the data provided on AWS. Other open data are not prepackaged at all: for example, the US Census Bureau has information on state tax collections from 1992 to 2014, but on the website, recent data is separated from historical data, and from there, visitors can only view data for one year at a time. Furthermore, while tables for recent years can be downloaded as CSV or Excel workbooks, older tables are only available as Excel files. How do these issues affect people seeking to work with open data? Added to the complexity of processing large amounts of data is the challenge of first collecting all of the available files, then processing each one (separately, if they come in different file types and data formats) before putting everything together. Read on to see how Essentia rises to the occasion.

Read more
aq_pp, AWS, Big Data Trends, Blog Archive, Open Data, Public Data Sets, Use Case / 20 6月, 2016

Open Data and Essentia

Open data as a term is relatively new, but the concept is not. The idea that information should be freely available for unrestricted use has been around for awhile, but didn’t really take off before the rise of the Internet made it feasible to share data quickly and globally. Add in the recent popularity of big data, and it makes a lot of sense that public datasets are on the rise as well. Enormous amounts of valuable data, including everything from climate projections to genome sequences, have been made available by the organizations that own them and are now free for download: on Amazon Public Data Sets,, and more. The possibilities are endless, as researchers, businesses, and citizens from around the world have access to data that would otherwise be extremely expensive and time-consuming to collect. The challenge that follows is how these researchers and businesses are going to handle these large datasets, including storage, organization, and analysis.

Read more
Big Data News, Big Data Trends, Blog Archive / 6 10月, 2015

Wearable Technology and Healthcare

According to my mobile device I took about 5,300 steps yesterday, which was slightly more than the day before.  While this is interesting for me to track my activity levels, I doubt most other people would care, except perhaps my doctor or health insurance provider. But this does put me in a growing group that is potentially leading the way to a revolution in healthcare.

In a recent Washington Post article, currently one in 10 American adults own fitness tracking devices.  In addition there has been an “explosion in extreme tracking”  where more and more people use them not just as step counters, but also to monitor heart rate, stress, sleep cycles, and just about every other possible measurable biometric via wearable technology.  In fact, we have a couple of people who fit this description here in our own offices. This is creating tons of data – big data – for doctors and insurance carriers to synthesize.

Read more
Apache, Blog Archive, Tableau, Use Case / 23 7月, 2015

Tableau and Essentia

A common “big-data” workflow requires reducing a vast amount of complex data into a form that can be easily visualized. With AuriQ Essentia, your data from the cloud can be efficiently parsed, cleaned, and reduced directly into Tableau Extract files (.tde) to generate interactive and compelling visualizations.

Take, for example, this Tableau Dashboard created from web server logs processed with Essentia. This visualization presents data from individuals visiting specific pages and contains two dashboards: “Visitor Analysis” and “Pages by Time”. In the “Visitor Analysis” tab, you can see what browsers were being run, how long visits lasted, and what operating systems were used. In the “Pages by Time” dashboard, you can analyze trends over time by day, day of week, and hour.

Read more
Blog Archive, Query, Scanner, Use Case / 27 5月, 2015

Essentia Query

Great, you have a lot of data, but how do you know what to do with it? Being able to explore your data without having to load everything into a database beforehand is a very necessary and useful step in deciding which data should be analyzed and how to analyze it.

With Essentia you can take a subset of your data and run SQL queries on it without loading anything into a database table. By streaming the data from S3 or a local directory directly into the query, you can quickly explore a subset of the files you plan to process before you commit a large amount of your resources to analyzing the full set of data.

Read more
Blog Archive, Marketing Analytics, Uncategorized, Use Case / 12 5月, 2015

Digital Marketing Log Analytics

Although Essentia is available as a standalone software product, we also use it daily in our marketing analytics service offered under a SaaS model.
Like a lot of other people, we like the machine learning and data mining libraries available to R and Python to help explore and analyze data. But like everyone else, we face the burden of having to first clean, parse, and reduce large amounts of data before being able to use those analytic tools.

Read more
AWS, Blog Archive, , News, Redshift, Streamer / 7 4月, 2015

Essentia’s New AMI: What You Need to Know

Link to Essentia 2.1.7

Essentia on AWS Marketplace

There’s a new version of Essentia available on the AWS Marketplace! The upgrade from version 2.0.21 to 2.1.7 includes a few key changes.

First, Essentia now works on HVM instances instead of PV instances. This allows users to take advantage of Amazon’s newer generation of instance types, and makes security management much easier. Second, the new version of Essentia provides many new features including the ability to stream, clean, and move massive amounts of data from S3 data stores directly into a Redshift data warehouse. This streaming capability works effortlessly, regardless of compression type, and without the need to generate intermediate files or write complicated code.

The AWS community site contains a free version of the AMI that limits use to a single instance at a time.

Read more
Blog Archive, Redshift, Scanner, Streamer, Use Case / 24 3月, 2015

One File or Thousands: changing compression type has never been easier

file-conversionWhen it comes to big data, compression is key. However, many popular analysis tools can’t handle Zip compressed files. Each Zip file must be converted to another compression type (such as Gzip) before being analyzed by these tools. This accrues a high cost due to the need to store these extra files as well as the large amount of time it takes to carry out the conversion. That is, if you’re not using Essentia.

The parts of Essentia that dramatically improve on this are its native support of Zip and Gzip compression as well as its ability to streaming unzip Zip files and then compress them into Gzip format.Thus you can select exactly the Zip files you need from wherever your data is stored using the Essentia Scanner, streaming convert them into Gzip format, and then output them wherever you want. They can be sent directly into Redshift or other analysis tools, saved to file, or sent to S3 for later loading.

Read more
Apache, aq_pp, Blog Archive, Modules, Scanner / 24 3月, 2015

Using a custom module to process Apache Logs

apache_iconThe wealth of information in Apache logs is astounding, but this information can be buried in hard-to-find files, error-prone, and difficult to extract. Essentia easily handles the first two issues using the Essentia Scanner and Essentia Preprocessor. However, to extract more specialized data from the Apache Logs and other forms of data, Essentia allows easy creation and integration of custom modules to supplement its analysis.

Read more
Blog Archive, Scanner / 17 3月, 2015

Needle in a Haystack. Using Essentia to organize log files

It’s simple: big data means lots of logs and these logs tend to be disordered and very hard to distinguish. If the desired log files are in different directories and particularly if there are other log files in those directories along side them, it can sometimes be necessary to specify these files and their paths by name. This is an incredibly messy and time consuming process that Essentia remedies.

With the Essentia Scanner you simply point to your datastore–where your files are being stored–and the list of filenames are stored in a database file. You can then easily explore how these files are organized and categorize your files into the segments you want them to be in.

Read more