AqTools, aq_pp, AWS, Blog Archive, Open Data, Public Data Sets, Use Case / 15 7月, 2016

Open Data and Essentia (Part 2)

A few weeks ago, I wrote about the recent rise in popularity of open data, and how these public data sets can be easily processed with Essentia. All the examples in that post are based on Amazon’s AWS Public Data Sets, which are (for the most part) large databases put together by organizations for public access and use. However, because the AWS data sets are voluntarily published by each organization, many are not regularly updated. In the US Transportation database available on AWS, aviation records and statistics are provided from 1988 to 2008. More recent data (through April 2016) can be found on the US Department of Transportation’s website, but in a format different from that of the data provided on AWS. Other open data are not prepackaged at all: for example, the US Census Bureau has information on state tax collections from 1992 to 2014, but on the website, recent data is separated from historical data, and from there, visitors can only view data for one year at a time. Furthermore, while tables for recent years can be downloaded as CSV or Excel workbooks, older tables are only available as Excel files. How do these issues affect people seeking to work with open data? Added to the complexity of processing large amounts of data is the challenge of first collecting all of the available files, then processing each one (separately, if they come in different file types and data formats) before putting everything together. Read on to see how Essentia rises to the occasion.

Read more
aq_pp, AWS, Big Data Trends, Blog Archive, Open Data, Public Data Sets, Use Case / 20 6月, 2016

Open Data and Essentia

Open data as a term is relatively new, but the concept is not. The idea that information should be freely available for unrestricted use has been around for awhile, but didn’t really take off before the rise of the Internet made it feasible to share data quickly and globally. Add in the recent popularity of big data, and it makes a lot of sense that public datasets are on the rise as well. Enormous amounts of valuable data, including everything from climate projections to genome sequences, have been made available by the organizations that own them and are now free for download: on Amazon Public Data Sets,, and more. The possibilities are endless, as researchers, businesses, and citizens from around the world have access to data that would otherwise be extremely expensive and time-consuming to collect. The challenge that follows is how these researchers and businesses are going to handle these large datasets, including storage, organization, and analysis.

Read more
Apache, Blog Archive, Tableau, Use Case / 23 7月, 2015

Tableau and Essentia

A common “big-data” workflow requires reducing a vast amount of complex data into a form that can be easily visualized. With AuriQ Essentia, your data from the cloud can be efficiently parsed, cleaned, and reduced directly into Tableau Extract files (.tde) to generate interactive and compelling visualizations.

Take, for example, this Tableau Dashboard created from web server logs processed with Essentia. This visualization presents data from individuals visiting specific pages and contains two dashboards: “Visitor Analysis” and “Pages by Time”. In the “Visitor Analysis” tab, you can see what browsers were being run, how long visits lasted, and what operating systems were used. In the “Pages by Time” dashboard, you can analyze trends over time by day, day of week, and hour.

Read more
Blog Archive, Query, Scanner, Use Case / 27 5月, 2015

Essentia Query

Great, you have a lot of data, but how do you know what to do with it? Being able to explore your data without having to load everything into a database beforehand is a very necessary and useful step in deciding which data should be analyzed and how to analyze it.

With Essentia you can take a subset of your data and run SQL queries on it without loading anything into a database table. By streaming the data from S3 or a local directory directly into the query, you can quickly explore a subset of the files you plan to process before you commit a large amount of your resources to analyzing the full set of data.

Read more
Blog Archive, Marketing Analytics, Uncategorized, Use Case / 12 5月, 2015

Digital Marketing Log Analytics

Although Essentia is available as a standalone software product, we also use it daily in our marketing analytics service offered under a SaaS model.
Like a lot of other people, we like the machine learning and data mining libraries available to R and Python to help explore and analyze data. But like everyone else, we face the burden of having to first clean, parse, and reduce large amounts of data before being able to use those analytic tools.

Read more
Blog Archive, Redshift, Scanner, Streamer, Use Case / 24 3月, 2015

One File or Thousands: changing compression type has never been easier

file-conversionWhen it comes to big data, compression is key. However, many popular analysis tools can’t handle Zip compressed files. Each Zip file must be converted to another compression type (such as Gzip) before being analyzed by these tools. This accrues a high cost due to the need to store these extra files as well as the large amount of time it takes to carry out the conversion. That is, if you’re not using Essentia.

The parts of Essentia that dramatically improve on this are its native support of Zip and Gzip compression as well as its ability to streaming unzip Zip files and then compress them into Gzip format.Thus you can select exactly the Zip files you need from wherever your data is stored using the Essentia Scanner, streaming convert them into Gzip format, and then output them wherever you want. They can be sent directly into Redshift or other analysis tools, saved to file, or sent to S3 for later loading.

Read more
Apache, Blog Archive, R, Use Case / 23 1月, 2015

Exploring Apache Logs with Essentia and R

Apache logs are some of the most troublesome logs being used today. They can come in all shapes and sizes and often come riddled with errors and missing data. While the benefits of analyzing these logs are enormous, most powerful analysis tools lack the necessary versatility. Essentia fixes that.

With the Essentia Log Converter you can transform any apache log into a format ready for analysis. You can use it to convert your logs and then stream them directly into the Essentia Preprocessor to ignore the irrelevant columns, clean the data, and perform simple analysis on it.

Read more
AWS, Blog Archive, , Redshift, Use Case / 9 1月, 2015

Data warehousing with AWS Redshift and Essentia

Amazon developed its Redshift service to accommodate data warehousing needs on a reliable, scalable platform.  With a highly efficient SQL engine that executes queries in parallel, users can gain insight into their data quickly.  But to access that power, the data first needs to be loaded into the service.

Going from raw data into a form that an application (in this case Redshift) is used, is one of the main strengths of Essentia.   Let’s focus one one the most common scenarios:  data is stored in its raw form on S3, which can be accessed by Redshift or any other relevant service.  Typically, the data is ‘dirty’, containing missing, irrelevant, or otherwise unneeded data.

Read more