Some Criticisms have been raised by observers, as an example, in July of 2014, Gartner, in
“The Data Lake Fallacy: All Water and Little Substance.” warns enterprises to “Beware of the Data Lake Fallacy”.
Looking into the details, the concerns raised by Gartner centre mainly around:
- Description of the source data
- The level of skill of the users of the Data Lake.
These concerns have been addressed in the many government implementations, as only government data scientists will directly interact with the lake, Gartner’s main concern does not apply. In fact Garner specifically points to data scientists as benefiting from the Data Lake concept.
A further potential challenge that Gartner tackles is performance. “Tools and data interfaces simply cannot perform at the same level against a general-purpose store as they can against optimized and purpose-built infrastructure.” This statement could be challenged on its generality and its lack of regard for certain modern tools, such as in-memory computing. However, even taking it at face value, the “performance” being ignored here is the end to end time to user of data. The time it takes to get data to where it is needed, day in day out, with new data streams coming online regularly. For this aspect, the architecture of the Data lake, where data is taken in, whatever the format, can enhance speed of the overall ingestion and delivery process in many cases.
The aim of the Data Lake is not to replace operational systems, such as CRM or ERP, the statement would indeed be correct if a lake attempted to emulate functionality of these. However the concept of Data Lake is rather to sacrifice certain efficiency in some areas, such as single operational transactions, in order to gain sizeable increases in performance in others, namely data orchestration and delivery of actionable information from across varied data sets.
The raison d’etre of Data Lake is that previous operational and business intelligence systems could not achieve this goal in a timely or resource and cost efficient manner. Otherwise it would not be even imagined.
Lastly, even comparing the Data Lake to traditional data warehouse technologies may not be accurate. The Data Lake’s principle benefit is the aggregation of data. Some of the traditional tasks will then be performed on the amalgamated data, or the data can be used in its raw format, especially when it comes from business intelligence systems. Therefore the Data Lake cuts across operational, archive and business intelligence systems, adding value to existing ones and not necessarily always replacing them.
Gartner goes on to applaud the Data Lake concept in several ways:
- it is certainly true that Data Lakes can provide value to various parts of the organization”
- a Data Lake certainly benefits IT in the short term in that IT no longer has to spend time understanding how information is used