Support Questions

Find answers, ask questions, and share your expertise

ConvertJSONtoSQL in Apache NiFi for Sending to PutHiveQL

avatar
Master Guru

5897-diagram.png

5896-convertjsontosql.png

Is there anything special to get this to work?

Hive Table

create table
twitter(
  id int,
  handle string,
  hashtags string,
  msg string,
  time string,
  user_name string,
  tweet_id string,
  unixtime string,
  uuid string
) stored as orc
tblproperties ("orc.compress"="ZLIB");

Data is paired down tweet:

{ "user_name" : "Tweet Person", "time" : "Wed Jul 20 15:09:42 +0000 2016", "unixtime" : "1469027382664", "handle" : "SomeTweeter", "tweet_id" : "755781737674932224", "hashtags" : "", "msg" : "RT some stuff" }

1 ACCEPTED SOLUTION

avatar
Master Guru
@dpinkston

Not optimal, but this is a nice workaround:

Use ReplaceText processor

insert into twitter values (${tweet_id}, '${handle:urlEncode()}','${hashtag:urlEncode()}', '${msg:urlEncode()}','${time}', '${user_name:urlEncode()}','${tweet_id}', '${unixtime}','${uuid}')

So that's attributes in there.

I do url encode because of quotes and such. Would like a prepared statement or custom processor or call a groovy script. But this works.

View solution in original post

17 REPLIES 17

avatar
Expert Contributor

I used this method, but it is very slow, how about yours?

avatar
Master Guru

it wasn't slow. I will try in NiFI 1.0

avatar
Expert Contributor

I spent one day to insert 7000 rows data into hive, but I have more than 800 million rows.

avatar
Master Guru

if you have that many rows you need to go parallel and run on multiple nodes. You should probably trigger a Sqoop job or Spark SQL job from NiFi. have a few nodes running at once.

avatar
Master Guru

store to HDFS as ORC and then create HIVE table ontop of it.

I did 600,000 rows on a 4 GB machine and did that in a few minutes

avatar
Expert Contributor

thanks for your reply. do you have a example for details?

avatar
Master Guru

I confirmed this to be a bug in ConvertJSONToSQL, I have written up NIFI-4071, please see the Jira for details.