- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi PutHiveQL from SelectHiveQL
- Labels:
-
Apache NiFi
Created 07-31-2018 08:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I have the Hive Rawdata External Table and need populate swap data from select * from rawdata.
The process is execute by crontab daily, and i want migrate the process to Nifi.
Whats best pratice for this question?
Thanks.
Created 07-31-2018 11:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to move data between two hive tables then you don't need to use SelectHiveQL processor at all.
You create hive statement like below
insert into <db_name>.<final_table> select * from <db_name>.<rawdata>
Then execute the above statement using PutHiveQL processor.
To incrementally run this process then you need to store the state i.e. until what time you have already processed the data from rawdata table. Then only select the new data after the state value.
Please refer to this and this link for more details how to incrementally copy data in hive.
Created 07-31-2018 11:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to move data between two hive tables then you don't need to use SelectHiveQL processor at all.
You create hive statement like below
insert into <db_name>.<final_table> select * from <db_name>.<rawdata>
Then execute the above statement using PutHiveQL processor.
To incrementally run this process then you need to store the state i.e. until what time you have already processed the data from rawdata table. Then only select the new data after the state value.
Please refer to this and this link for more details how to incrementally copy data in hive.
Created 08-01-2018 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 08-02-2018 05:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 08-02-2018 10:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if you want to execute sequentially then you can use Success of PutHiveQL processor to trigger another job(i.e. start table B).
Flow:
1.GenerateFlowfile //start with tableA
2.PutHiveQL
3.ReplaceText //to prepare tableB statement 4.PutHiveQL
