The wealth of information in Apache logs is astounding, but this information can be buried in hard-to-find files, error-prone, and difficult to extract. Essentia easily handles the first two issues using the Essentia Scanner and Essentia Preprocessor. However, to extract more specialized data from the Apache Logs and other forms of data, Essentia allows easy creation and integration of custom modules to supplement its analysis.
Using python or a similar scripting language, its easily to create customized modules that provide additional functions to analyze your data. For the Apache Logs, we created a module called RT to allow us to extract information from the IP addresses in the data. This module added functions designed to extract the country, region, OS, and browser. Thus, information that would have been hard to extract using the default Essentia Preprocessor was quickly and efficiently extracted by simply telling the Preprocessor to use the functions in the RT module on the data’s IP addresses.
The Essentia Scanner and Preprocessor are already extremely powerful and versatile methods to select and analyze data. However, if you find that they don’t automatically include the functionality you need, creating a simple module will enable them to provide much more advanced analysis that is uniquely geared to your data.