Needle in a Haystack. Using Essentia to organize log files

It’s simple: big data means lots of logs and these logs tend to be disordered and very hard to distinguish. If the desired log files are in different directories and particularly if there are other log files in those directories along side them, it can sometimes be necessary to specify these files and their paths by name. This is an incredibly messy and time consuming process that Essentia remedies.

With the Essentia Scanner you simply point to your datastore–where your files are being stored–and the list of filenames are stored in a database file. You can then easily explore how these files are organized and categorize your files into the segments you want them to be in.

Read more

Processing Apache Logs with Essentia

Apache server logs present an important opportunity with a multitude of valuable insights to be gained, but are typically buried in S3 directories with many other such logs in entirely different formats. Not only must the correct logs be extracted from their datastore, they must be converted into a format that can be properly analyzed.

This is where Essentia comes in. First we scan the S3 directory to be sure to select exactly the access logs we want to analyze. Then we use the Essentia Log Converter to convert these access logs into a form readable by our Preprocessor (ie a singly -delimited format) on the fly.

In one step we ignore the irrelevant columns in the apache logs so we can focus on processing only the most relevant data. Then we utilize a custom C module to bolster Essentia’s analysis and extract the location and system information out of the users’ IP addresses.

Read more

Merging different log sources with Essentia

Information is everywhere and people are starting to realize the benefits to be gained by utilizing it. Unfortunately, this information is often spread across many different sets of files and can be stored in a variety of places. Finding all of this data and merging it into one, complete set of data that’s ready for analysis is a difficult and complicated task. We rose to this challenge and created Essentia to make this process quick, easy, and efficient.

By simply telling the Essentia Scanner where your data is located, you can immediately start to categorize your files so that you can select exactly the data you need. Then you can stream this data into the Essentia Preprocessor where it can be combined in a variety of ways to make sure you get the entire set of data that you’re looking for.

Read more

Exploring Apache Logs with Essentia and R

Apache logs are some of the most troublesome logs being used today. They can come in all shapes and sizes and often come riddled with errors and missing data. While the benefits of analyzing these logs are enormous, most powerful analysis tools lack the necessary versatility. Essentia fixes that.

With the Essentia Log Converter you can transform any apache log into a format ready for analysis. You can use it to convert your logs and then stream them directly into the Essentia Preprocessor to ignore the irrelevant columns, clean the data, and perform simple analysis on it.

Read more

Data warehousing with AWS Redshift and Essentia

Amazon developed its Redshift service to accommodate data warehousing needs on a reliable, scalable platform.  With a highly efficient SQL engine that executes queries in parallel, users can gain insight into their data quickly.  But to access that power, the data first needs to be loaded into the service.

Going from raw data into a form that an application (in this case Redshift) is used, is one of the main strengths of Essentia.   Let’s focus one one the most common scenarios:  data is stored in its raw form on S3, which can be accessed by Redshift or any other relevant service.  Typically, the data is ‘dirty’, containing missing, irrelevant, or otherwise unneeded data.

Read more