- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Automate loading data into HDFS
- Labels:
-
Apache Hadoop
Created ‎07-08-2017 10:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi, I am required to investage setting up a HDFS for our company. Can someone please explain in basic points what we would use to automatically import csv data on a monthly basis, carry out transformations on this data (similar to sql SSIS) and store this data to feed our BI tools?
Any help is very appreciated!
Created ‎07-10-2017 02:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I highly recommend you look into HDF/NiFi as a possible solution to this problem. You can easily use NiFi to pull from source systems, do basic transformations and then store the data in HDFS, Hive, HBase, etc.
In terms of feeding the data to BI tools, you may want to consider storing the data in Hive for the best performance. Take a look at this article for one way to solve some of what you are trying to do: https://community.hortonworks.com/articles/52856/stream-data-into-hive-like-a-king-using-nifi.html
Created ‎07-10-2017 02:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I highly recommend you look into HDF/NiFi as a possible solution to this problem. You can easily use NiFi to pull from source systems, do basic transformations and then store the data in HDFS, Hive, HBase, etc.
In terms of feeding the data to BI tools, you may want to consider storing the data in Hive for the best performance. Take a look at this article for one way to solve some of what you are trying to do: https://community.hortonworks.com/articles/52856/stream-data-into-hive-like-a-king-using-nifi.html
