Is it really that difficult to collect big data?
Since processing and storage of big data is being continuously improved, it may appear that the obstacles in the path of supremacy of big data are getting fewer. But, the scenario is a little different since it involves many steps and although, a huge amount of work has been performed at various levels, the complete solution is yet to be achieved. The data collection issue can be treated as a problem and be broken down into steps and subsets. Only then, the full magnitude of the problem may be understood. One should begin from the root and then traverse the whole body of big data projects to understand where the scars are yet to heal.
The different data sources
Data collection is performed from many sources. There are traditional sources like the usual transactional systems. This data can be deposited into the data lake by simple copy pasting or by embracing the virtual architecture of the data. Another kind of data is structured type which comes from different sensors. Because of highly standardized nature, little is required to transform this data to be sent upstream. Hence, despite its volume, this type of data is very useful. There is, of course, the unstructured data which collects textual data and media files. Such data collection is relatively easy since you need to dump the data into the lake and require no schema. Such variants of data make data collection a challenging task.
Big data has so many types that their platforms are meant to store all kinds of data. From the simplest of file storages to relational databases going up to fifth normal form, from columnar-style access to direct read- everything is possible in the big data storage. Especially, if there is a provision of cloud storage, the elasticity increases. Hence, storage hardly possesses a big problem in theory. But, reality is a difficult business since there are issues to tackle. For example, there are core platforms like Hadoop ones, commercial distributions and others. So many options with each having its own advantage and affordability make the choice difficult. So, while storing is easy, you need to choose the best for your purpose to make sure it fits your demand and skill level.
Usability of data
After all the data arrives at the data lake, the main issue is to transform them in such a way so that they become usable, consistent and maintains quality. Hence, this is where the story gets complicated since there is no machine intervention possible. Of course, for specific sources, there are existing solutions that initiate automated process. But, if you are handling multiple sources with little correlation between them, then you are in for trouble and only have your wits to manage the situation. So, beware of heterogeneous data while exploring it. This is exactly where technology must intervene more and more so that big data becomes easier. So, while there are plenty of speed improvements in other fronts, in terms of human intervention, usability still poses a challenge to both man and machine.
About the Author
DataFactZ is a professional services company that provides consulting and implementation expertise to solve the complex data issues facing many organizations in the modern business environment. As a highly specialized system and data integration company, we are uniquely focused on solving complex data issues in the data warehousing and business intelligence markets.