Support Questions

Find answers, ask questions, and share your expertise

Unable to insert the data into Hive using NiFi application

avatar
New Contributor

Hi Team,
@Shu_ashu , @MattWho , @Shelton 
I am retrieving Zomoto API for my project. I am able to filter the JSON data using, splitJson Process. Everything is working fine. 

1.PNG If you see the above image, till AttributetoJSON process it is working fine. The output of ConvertJSONToSQL is giving me the insert statements with place holders (?). When I am passing the same statement(output of ConvertJSONTOSQL) to PutHiveQL, it is in not working throwing the below error.

Output of AttributeToJson Process:
{
"avg_rating" : "4.4",
"res_id" : "17085030",
"restName" : "Fogo de Chao Brazilian Steakhouse"
}

Output of ConvertJSONToSQL:
INSERT INTO restaurent (avg_rating, res_id, restname) VALUES (?, ?, ?)

Input to the PUTHiveQL:
INSERT INTO restaurent (avg_rating, res_id, restname) VALUES (?, ?, ?)

Error:

 

 

 

2021-03-17 04:28:52,710 INFO [Timer-Driven Process Thread-8] hive.ql.parse.ParseDriver Parse Completed                       
2021-03-17 04:28:52,715 ERROR [Timer-Driven Process Thread-8] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=1f17066e-0
178-1000-e53b-6257710b82b2] Failed to update Hive for StandardFlowFileRecord[uuid=60c6a702-df1e-417c-9aca-652e49b1b886,claim=
StandardContentClaim [resourceClaim=StandardResourceClaim[id=1615954135719-5, container=default, section=5], offset=738314, l
ength=78],offset=0,name=2c54bb5e-2517-42e0-aa40-cecac3f738db,size=78] due to org.apache.hive.service.cli.HiveSQLException: Er
ror while compiling statement: FAILED: ParseException line 1:70 cannot recognize input near '?' ',' '?' in value row construc
tor; routing to failure: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseExceptio
n line 1:70 cannot recognize input near '?' ',' '?' in value row constructor                                                 
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:70 cannot recogn
ize input near '?' ',' '?' in value row constructor    

 

 

 


Flow File output of ConvertJSONToSQL:
2.PNG

ConvertJSONToSQL Properties:

3.PNG

Any help is appreciated, let me know if any help is needed.

5 REPLIES 5

avatar
Master Guru

https://docs.cloudera.com/cdf-datahub/7.2.7/nifi-hive-ingest/topics/cdf-datahub-nifi-hive-ingest.htm...

 

PutHive3Streaming is faster and better.

 

https://docs.cloudera.com/cdf-datahub/7.2.7/nifi-hive-ingest/topics/cdf-datahub-hive-ingest-data-tar...

 

What version of Hive?   Is this CDH?  HDP?

 

You can also do PutORC or convert to ORC and push to HDFS

Or push to HDFS as Parquet

 

https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html

 

Use Record processors, they are easier and MUCH faster.   You won't need a split then.

https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html

 

I recommend using CFM NiFi version 1.11.4 or newer.    

avatar
New Contributor

@TimothySpann 

Thanks for reply. I have alternate way for this issue, instead of using convertjsontosql I can use replacetext process. The reason is convertjsontosql, is giving insert statement automatically. 

Is there any way still to use convertjsontosql and puthiveql.

I will also implement the recommendations provided by you.

HDP: 2.6.5
Hive: 1.2.1000

Let me know if you need any other information from my side.

Thanks

avatar
Master Guru

Ok that's pretty old.

 

PutHiveQL is not the best option, but for now.

 

see:   https://issues.apache.org/jira/browse/NIFI-4684

 

PutHiveStreaming or PutHDFS or PutORC is better and create an external table

 

or PutDatabaseRecord for JDBC

 

I would highly recommend updating to HDF 3.5.2 and HDP 3.1 or CDP as these versions are going to be out of support soon.

avatar
New Contributor

@TimothySpann 

Thanks for your reply. So, in my case if I upgrade my HDP to 3.1.0 then convertJSONToSQL will work ?? 
https://issues.apache.org/jira/browse/NIFI-4684
In the above link, it was mentioned that instead of sql we need to add hiveql, the the arguments will be hiveql.arg.1.value. I did that change but still no luck. Need your help on this.

After the AttributeTOJson, which process will you recommend (PutHDFS, if yes do you have any article for that)

avatar
Master Guru

NiFi versions are tied to Hive versions so you need to compatible one.   Check with your Cloudera team to ge tthe correct version.

 

Using PutHive3Streaming will be faster.

 

So is just PutOrc, PutParquet or PutHDFS.