Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ConvertJSONtoSQL in Apache NiFi for Sending to PutHiveQL

avatar
Master Guru

5897-diagram.png

5896-convertjsontosql.png

Is there anything special to get this to work?

Hive Table

create table
twitter(
  id int,
  handle string,
  hashtags string,
  msg string,
  time string,
  user_name string,
  tweet_id string,
  unixtime string,
  uuid string
) stored as orc
tblproperties ("orc.compress"="ZLIB");

Data is paired down tweet:

{ "user_name" : "Tweet Person", "time" : "Wed Jul 20 15:09:42 +0000 2016", "unixtime" : "1469027382664", "handle" : "SomeTweeter", "tweet_id" : "755781737674932224", "hashtags" : "", "msg" : "RT some stuff" }

1 ACCEPTED SOLUTION

avatar
Master Guru
@dpinkston

Not optimal, but this is a nice workaround:

Use ReplaceText processor

insert into twitter values (${tweet_id}, '${handle:urlEncode()}','${hashtag:urlEncode()}', '${msg:urlEncode()}','${time}', '${user_name:urlEncode()}','${tweet_id}', '${unixtime}','${uuid}')

So that's attributes in there.

I do url encode because of quotes and such. Would like a prepared statement or custom processor or call a groovy script. But this works.

View solution in original post

17 REPLIES 17

avatar
Expert Contributor

I used this method, but it is very slow, how about yours?

avatar
Master Guru

it wasn't slow. I will try in NiFI 1.0

avatar
Expert Contributor

I spent one day to insert 7000 rows data into hive, but I have more than 800 million rows.

avatar
Master Guru

if you have that many rows you need to go parallel and run on multiple nodes. You should probably trigger a Sqoop job or Spark SQL job from NiFi. have a few nodes running at once.

avatar
Master Guru

store to HDFS as ORC and then create HIVE table ontop of it.

I did 600,000 rows on a 4 GB machine and did that in a few minutes

avatar
Expert Contributor

thanks for your reply. do you have a example for details?

avatar
Master Guru

I confirmed this to be a bug in ConvertJSONToSQL, I have written up NIFI-4071, please see the Jira for details.