Apache server logs present an important opportunity with a multitude of valuable insights to be gained, but are typically buried in S3 directories with many other such logs in entirely different formats. Not only must the correct logs be extracted from their datastore, they must be converted into a format that can be properly analyzed.

This is where Essentia comes in. First we scan the S3 directory to be sure to select exactly the access logs we want to analyze. Then we use the Essentia Log Converter to convert these access logs into a form readable by our Preprocessor (ie a singly -delimited format) on the fly.

In one step we ignore the irrelevant columns in the apache logs so we can focus on processing only the most relevant data. Then we utilize a custom C module to bolster Essentia’s analysis and extract the location and system information out of the users’ IP addresses.