Essentia Overview

Essentia is a highly efficient and highly scalable solution for managing, processing and analyzing vast amounts of unstructured, semi-structured and structured data stored in cloud data lakes.  This can be categorized as big data and/or complex data.

The two main components of Essentia is a data pre-processing engine and an analytic engine.  It utilizes parallel processing so all operations can be distributed across multiple virtual machines, enabling Essentia to scale to handle any data processing workload.  Data analysis is performed in-memory for incredibly fast results.



Support for Popular Tools and Languages.

Essentia supports many high level languages to programmatically execute combinations of commands. It also supports many popular data analysis tools and databases like Redshift, MySQL, Tableau, Qlikview, R Studio, R Shiny, Excel and many more through API’s or direct integrations.


Data Collection & Ingestion

Essentia eliminates the need to perform any ETL on the original data at the time of ingest. Once data is transferred to Amazon S3 or Azure Blob, it can be left in-place and as-is for the entirety of future data processing and analysis workflows.


Data Virtualization

Virtualization of data is a key component of Essentia.  Rules-based data categorization tools allow users to create any number of virtual groupings.

category plus

Essentia data virtualization eliminates concerns of overwriting or corrupting original source data, enabling greater experimentation, sharing and collaboration.

Data Exploration

Scan files to view any underlying structure, view sample data, run SQL like queries, and test out pre-processing rules.  This can all be done with the raw data, even while in compressed format, on a single node.


You don’t have to be an expert on the source data to begin to explore and get meaningful insights with Essentia.

Advanced Analysis

Essentia scripts can be run from the web UI or from the CLI (command line interface). Scripts enable users to apply more powerful and complicated queries to one or more categories.


An integrated in-memory, parallelized database enables fast, iterative analysis of your data. Data in memory can be integrated with machine learning algorithms for even more complex analyses.