Hadoop Best Practices for Data Ingestion

Hadoop Data ingestion is the beginning of your data pipeline in a data lake. It means taking data from various silo databases and files and putting it into Hadoop. Sounds arduous? For many companies, it does turn out to be an intricate task. That is why they take more than a year to ingest all the……

What is Data Lake? It’s Architecture

What is Data Lake? A Data Lake is a storage repository that can store large amount of structured, semi-structured, and unstructured data. It is a place to store every type of data in its native format with no fixed limits on account size or file. It offers high data quantity to increase analytic p……

史上最全!最好用的大数据工具及使用方法

大数据的工具数以千计,它们无一不承诺省时省钱并且还能帮你挖掘从未被发现的商业价值。它们的承诺也许都是真的,但是真正实际使用的过程中可能会由于选项太多而不知所措。 哪个才是你真正所需要的呢? 哪个才是最适合你的项目? 为了帮你节省时间并且让你第一次使用就能挑选出正确的工具,我们搜集和整理了数据提取、数据……

Battle of ETL tools – SAP vs. Informatica vs. DataStage vs. Microsoft vs. MuleSoft vs. Talend

What’s happening in the Data Integration space? In this digital era, the Enterprise data is exploding and the need for data intensive computing keeps growing each day; more so in the space of data mining and analytics. Also the consolidated EIM/ETL tool offerings from different vendors have becom……