Created on 03-17-2021 09:09 PM - edited 03-17-2021 09:12 PM
Hi Team,
@Shu_ashu , @MattWho , @Shelton
I am retrieving Zomoto API for my project. I am able to filter the JSON data using, splitJson Process. Everything is working fine.
If you see the above image, till AttributetoJSON process it is working fine. The output of ConvertJSONToSQL is giving me the insert statements with place holders (?). When I am passing the same statement(output of ConvertJSONTOSQL) to PutHiveQL, it is in not working throwing the below error.
Output of AttributeToJson Process:
{
"avg_rating" : "4.4",
"res_id" : "17085030",
"restName" : "Fogo de Chao Brazilian Steakhouse"
}
Output of ConvertJSONToSQL:
INSERT INTO restaurent (avg_rating, res_id, restname) VALUES (?, ?, ?)
Input to the PUTHiveQL:
INSERT INTO restaurent (avg_rating, res_id, restname) VALUES (?, ?, ?)
Error:
2021-03-17 04:28:52,710 INFO [Timer-Driven Process Thread-8] hive.ql.parse.ParseDriver Parse Completed
2021-03-17 04:28:52,715 ERROR [Timer-Driven Process Thread-8] o.apache.nifi.processors.hive.PutHiveQL PutHiveQL[id=1f17066e-0
178-1000-e53b-6257710b82b2] Failed to update Hive for StandardFlowFileRecord[uuid=60c6a702-df1e-417c-9aca-652e49b1b886,claim=
StandardContentClaim [resourceClaim=StandardResourceClaim[id=1615954135719-5, container=default, section=5], offset=738314, l
ength=78],offset=0,name=2c54bb5e-2517-42e0-aa40-cecac3f738db,size=78] due to org.apache.hive.service.cli.HiveSQLException: Er
ror while compiling statement: FAILED: ParseException line 1:70 cannot recognize input near '?' ',' '?' in value row construc
tor; routing to failure: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseExceptio
n line 1:70 cannot recognize input near '?' ',' '?' in value row constructor
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:70 cannot recogn
ize input near '?' ',' '?' in value row constructor
Flow File output of ConvertJSONToSQL:
ConvertJSONToSQL Properties:
Any help is appreciated, let me know if any help is needed.
Created 03-18-2021 08:18 AM
PutHive3Streaming is faster and better.
What version of Hive? Is this CDH? HDP?
You can also do PutORC or convert to ORC and push to HDFS
Or push to HDFS as Parquet
https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html
Use Record processors, they are easier and MUCH faster. You won't need a split then.
https://www.datainmotion.dev/2020/12/simple-change-data-capture-cdc-with-sql.html
I recommend using CFM NiFi version 1.11.4 or newer.
Created 03-18-2021 12:09 PM
@TimothySpann
Thanks for reply. I have alternate way for this issue, instead of using convertjsontosql I can use replacetext process. The reason is convertjsontosql, is giving insert statement automatically.
Is there any way still to use convertjsontosql and puthiveql.
I will also implement the recommendations provided by you.
HDP: 2.6.5
Hive: 1.2.1000
Let me know if you need any other information from my side.
Thanks
Created 03-18-2021 12:23 PM
Ok that's pretty old.
PutHiveQL is not the best option, but for now.
see: https://issues.apache.org/jira/browse/NIFI-4684
PutHiveStreaming or PutHDFS or PutORC is better and create an external table
or PutDatabaseRecord for JDBC
I would highly recommend updating to HDF 3.5.2 and HDP 3.1 or CDP as these versions are going to be out of support soon.
Created 03-18-2021 02:03 PM
@TimothySpann
Thanks for your reply. So, in my case if I upgrade my HDP to 3.1.0 then convertJSONToSQL will work ??
https://issues.apache.org/jira/browse/NIFI-4684
In the above link, it was mentioned that instead of sql we need to add hiveql, the the arguments will be hiveql.arg.1.value. I did that change but still no luck. Need your help on this.
After the AttributeTOJson, which process will you recommend (PutHDFS, if yes do you have any article for that)
Created 03-18-2021 02:06 PM
NiFi versions are tied to Hive versions so you need to compatible one. Check with your Cloudera team to ge tthe correct version.
Using PutHive3Streaming will be faster.
So is just PutOrc, PutParquet or PutHDFS.