Support Questions

TimothySpann · ‎07-20-2016

Is there anything special to get this to work?

Hive Table

create table
twitter(
  id int,
  handle string,
  hashtags string,
  msg string,
  time string,
  user_name string,
  tweet_id string,
  unixtime string,
  uuid string
) stored as orc
tblproperties ("orc.compress"="ZLIB");

Data is paired down tweet:

{ "user_name" : "Tweet Person", "time" : "Wed Jul 20 15:09:42 +0000 2016", "unixtime" : "1469027382664", "handle" : "SomeTweeter", "tweet_id" : "755781737674932224", "hashtags" : "", "msg" : "RT some stuff" }

TimothySpann · ‎07-30-2016

@dpinkston

Not optimal, but this is a nice workaround:

Use ReplaceText processor

insert into twitter values (${tweet_id}, '${handle:urlEncode()}','${hashtag:urlEncode()}', '${msg:urlEncode()}','${time}', '${user_name:urlEncode()}','${tweet_id}', '${unixtime}','${uuid}')

So that's attributes in there.

I do url encode because of quotes and such. Would like a prepared statement or custom processor or call a groovy script. But this works.

View solution in original post

Report Inappropriate Content · ‎08-11-2016

I used this method, but it is very slow, how about yours?

TimothySpann · ‎08-11-2016

it wasn't slow. I will try in NiFI 1.0

Report Inappropriate Content · ‎08-11-2016

I spent one day to insert 7000 rows data into hive, but I have more than 800 million rows.

TimothySpann · ‎08-11-2016

if you have that many rows you need to go parallel and run on multiple nodes. You should probably trigger a Sqoop job or Spark SQL job from NiFi. have a few nodes running at once.

TimothySpann · ‎02-28-2017

store to HDFS as ORC and then create HIVE table ontop of it.

I did 600,000 rows on a 4 GB machine and did that in a few minutes

Report Inappropriate Content · ‎08-11-2016

thanks for your reply. do you have a example for details?

TimothySpann · ‎08-11-2016

Sqoop is just regular sqoop. You call it with executeprocess.

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_create_hive_table_literal

NiFi + Spark (can be site-to-site, command trigger, kafka)

https://community.hortonworks.com/articles/30213/us-presidential-election-tweet-analysis-using-hdfn....

https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html

mburgess · ‎06-14-2017

I confirmed this to be a bug in ConvertJSONToSQL, I have written up NIFI-4071, please see the Jira for details.

Cloudera Community

Support Questions

ConvertJSONtoSQL in Apache NiFi for Sending to PutHiveQL