Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

Solved Go to solution
Highlighted

What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

I'm new to Hadoop & NIFI but I checked available processors that should be suitable to such case. I've used ExecuteSQL processor to get data out of RDBMS but I'm not sure which processor(s) should be used to transfer data into Hive table. Can you provide "best practice" for such scenario?

Artur

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

@Artur Bukowski

You can land data in HDFS by using PutHDFS processor and build Hive tables on top of that location.

View solution in original post

6 REPLIES 6
Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

@Artur Bukowski

You can land data in HDFS by using PutHDFS processor and build Hive tables on top of that location.

View solution in original post

Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

Hi Neeraj, thanks a lot for fast answer!

I have additional question: is there NIFI processor I can use to "build Hive tables on top of that location"? Or this step can only be done outside NIFI?

Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

@Artur Bukowski

Outside afaik, You land the data into HDFS and run create external table ....location 'hdfslocation';

Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

Thanks a lot for your answer!

Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

@Artur Bukowski - with large volumes of data and in this classic setup of RDBMS->HDFS you might be better off with sqoop, i.e. if your goal is to move those large datasets in parallel.

If you are after data provenance of those datasets, then NiFi will be a better fit.

Highlighted

Re: What is the most recommended way (best practice) to load large amounts of data into hive table out of RDBMS using NIFI?

@Andrew Grande - thanks for your input. Is seems that for such scenario scoop would be better choice. However, I like NiFi approach and it would be great to have parallel export/import RDBMs operations available out of the box. Also it would be great to have ability to import/export data from NiFi level into/from Hive.
Don't have an account?
Coming from Hortonworks? Activate your account here