Data has evolved over the years. Complex data structures, unstructured data, real-time processing, growing data volumes, and new varieties of data are all part of the evolution. Platforms have changed as well. “Schema-less,” real-time events, “schema-on-read,” and extract/load/discover/transform (ELDT) are now part of our vernacular.
Despite these changes, many businesses rely on the same data warehouse infrastructure that they’ve relied on for years. Many businesses have also turned to data lakes, through platforms such as Apache Hadoop, NoSQL databases, and Apache Kafka, or cloud storage technologies like Amazon S3, as a cost-effective way of managing large volumes of disparate data sets. Unfortunately, the success rates of these data lakes have been disappointing, as they have not been able to deliver quicker or better value to businesses.