Over the years data has continued to explode in volume. With the advent of the Internet of Things (IoT) the avalanche of data continues to get even bigger and at much faster speed. Given the present scenario, organizations have two options: ignore this data or leverage it to enhance their competitiveness.
In general, the business users of today are much savvier and on the lookout for ways to leverage the existing data. The Bring Your Own Device (BYOD) is evolving to Bring Your Own Tool (BYOT). Tools available to users help analyze data, to see patterns, generate permutations and make predictions.
Traditionally, the only source of data used to be computer applications. Today, data hits organizations in varied shapes, sizes and forms; examples include mails, spreadsheets, video & audio files, machine generated data and voice activated devices like Alexa. With this bombardment of data, business users are not willing to wait upon for IT to provide them with what they are looking for. The traditional data storage methodologies, in any case, are incapable of ingesting such wide variety of data and catering to these users.
Traditional Data Storage Challenges
Timeliness. Adding data to a database or a data warehouse is time consuming and, in many cases, a convoluted process. Users do not have the time it takes to extract data from traditional data storage systems such as data warehouses.
Flexibility. Users not only want immediate access to data, but also want to use their own tools. Gone are the days when IT could limit the tools that users could use. In today’s day and age, users not only have the wherewithal to get the tools, they also have the ability to use these tools.
Data Types. Traditionally, there has been one type of data – raw numbers and words. Today’s data is varied and emanates from sources ranging from applications to mobile devices. Traditional data storage systems are ill equipped to handle all of this variety.
Retrieval. Getting to the data is as important as the data itself. Traditional data storage solutions simply assumed that the person trying to get access was an IT professional with knowledge of software development and related tools. In the present context, this no longer applies.
The solution must address all of the above shortcomings and more.
- Support multiple reporting tools
- Enable rapid ingestion of different types of data with no data manipulation or mapping
- Be flexible to support data analytics
- Embody easy search ability
The solution that can embody all of the above is Data Lake. Data Lake is the latest in a set of tools designed to meet today’s data storage and access challenges.
The Data Lake Architecture
Data Lake is a data-centered architecture capable of storing unlimited data in various types of formats. Ingestions can happen in any form. This can range from traditional data i.e. user entry to audio and video files. Curation happens through extraction and separate storage of the meta-data.
Data is ingested into the Data Lake in real time. Since there is no schema that data must follow, it lives in its original form in the Data Lake. Sitting on top of all of this is the analytical options. IT provides users the tools through which data can be analyzed. The users also have the option to pick their own choice of tools to use. Role of IT is increasingly changing from being in Information Technology to being in Data Technology. It now encompasses building of infrastructure and capture and storage of data for users who act as analysts and consume this data without the support or intervention of IT.
How do users interact with a Data Lake?
- The user browses through a Data Lake Service List.
- They pick and choose the data they are interested in and add it to their virtual shopping cart.
- They then use the tool of their choice to analyze the data.
- Users garner the required information which they may publish or use to make decisions.
- In addition to the above functionality, users also have the option to store their results in the Data Lake for the organization to use.
Characteristics of a user-friendly Data Lake
- IT Independent.
- Domain specific.
- Quality meta-data.
- Easy to expand.
- Data lineage.
- Analytical tools.
Weaknesses in Data Lake
- Incomplete data.
- Fixation on data ingestion.
- Over governance.
Data Lake is an efficient and effective solution to address the data-avalanche which organizations are facing today. It is of interest to savvy organizations who want to channel the power of data to rise above the competition.
Deepak Sharma is the Global IT Director with Agility. His portfolio includes the leadership of digital initiatives such as Data Lake, IoT and Artificial Intelligence. Deepak has over 3 decades of IT experience spread across multiple countries in establishing technology centers, product development and implementation of ERP systems.