Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

nifi ftp to hdfs to hive :: problems in hive

nifi ftp to hdfs to hive :: problems in hive

New Contributor

Hello. I'm creating a flow where i have to take files (*.csv.bz2) from an Ftp server. Next i have to put this files in the hdfs and next in hive (first in a temporal table after that in a partitionate table (orc format).

Resume:

1) list ftp

2) routeAnAttribute

3)FetchFtp

4)UpdateAttribute

5)PutHdfs -----At this point my flow works good and the files are in the hdfs-----

6) ListHdfs (in the path where point 5 Stored the files (*.csv.bz2) )

7) Rout an attribut

8)ReplaceText

9)PutHiveQl job's

Image:

12262-captura-de-pantalla-2017-02-07-a-las-214545.jpg

My problem is when y try to load the files into hive, because first i try to load the file into a temporal table: load data inpath "${path}/${filename}" overwrite into table staging.callis_ccm_test

and next i have to load this data from this table into the definitive table: insert into table staging.callis_ccm_test_hist partition (day=${filename:substring(12,16)}${filename:substring(17,19)}${filename:substring(20,22)} , hour=${filename:substring(23,25)}) select * from staging.callis_ccm_test

So, the problem is, the ReplaceText does not admit 2 (two) Sentences in the Replaement Value (place where i put the sql statement).

image replace text:

12263-captura-de-pantalla-2017-02-09-a-las-095901.jpg

12264-captura-de-pantalla-2017-02-09-a-las-100139.jpg

the error:

12265-eeecaptura-de-pantalla-2017-02-09-a-las-103456.jpg

note:

The load sentence works fine, but the insert sentence does not work and the flow entry in the "retry" flow. I checked the path in the hdfs and its ok, the file arrive ok.

I thinked the solution is create another one replaceText and puthiveql to with de insert sentence, but the insert statement must be executed with the same flowfile that I executed and before the next flowfile executes the Load statement (this has the overwrite function).

Error image when the replaceText has two sentences: This procedure is easy in bash or java, But in nifi is complicating me. Some one can help me?

2 REPLIES 2

Re: nifi ftp to hdfs to hive :: problems in hive

New Contributor

sorry (update): the error with the two sentences in the replaceText is:

12267-eeeecaptura-de-pantalla-2017-02-09-a-las-104947.jpg

Re: nifi ftp to hdfs to hive :: problems in hive

Hi @MARTIN GATTO,

The issue is that PutHiveQL processor does not support multi statements operations in the version you are using (it'll be possible in the next release of Apache NiFi). In your case you, I can see two potential options:

-> ReplaceText (to create the load statement) -> PutHiveQL -> ReplaceText (to create the insert statement) -> PutHiveQL

or

-> ReplaceText (as you did) -> SplitText (line count = 1, to have one flow file per line) -> PutHiveQL

Be careful with the last one: it supposes mono threaded processors, and sequential treatments to ensure order in flow files.

Hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here