Created on 07-20-2016 03:11 PM - edited 08-19-2019 01:30 AM
Is there anything special to get this to work?
Hive Table
create table twitter( id int, handle string, hashtags string, msg string, time string, user_name string, tweet_id string, unixtime string, uuid string ) stored as orc tblproperties ("orc.compress"="ZLIB");
Data is paired down tweet:
{ "user_name" : "Tweet Person", "time" : "Wed Jul 20 15:09:42 +0000 2016", "unixtime" : "1469027382664", "handle" : "SomeTweeter", "tweet_id" : "755781737674932224", "hashtags" : "", "msg" : "RT some stuff" }
Created 07-30-2016 08:12 PM
Not optimal, but this is a nice workaround:
Use ReplaceText processor
insert into twitter values (${tweet_id}, '${handle:urlEncode()}','${hashtag:urlEncode()}', '${msg:urlEncode()}','${time}', '${user_name:urlEncode()}','${tweet_id}', '${unixtime}','${uuid}')
So that's attributes in there.
I do url encode because of quotes and such. Would like a prepared statement or custom processor or call a groovy script. But this works.
Created 08-11-2016 12:22 PM
I used this method, but it is very slow, how about yours?
Created 08-11-2016 12:32 PM
it wasn't slow. I will try in NiFI 1.0
Created 08-11-2016 01:37 PM
I spent one day to insert 7000 rows data into hive, but I have more than 800 million rows.
Created 08-11-2016 01:42 PM
if you have that many rows you need to go parallel and run on multiple nodes. You should probably trigger a Sqoop job or Spark SQL job from NiFi. have a few nodes running at once.
Created 02-28-2017 05:13 AM
store to HDFS as ORC and then create HIVE table ontop of it.
I did 600,000 rows on a 4 GB machine and did that in a few minutes
Created 08-11-2016 02:11 PM
thanks for your reply. do you have a example for details?
Created 08-11-2016 02:59 PM
Sqoop is just regular sqoop. You call it with executeprocess.
https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_create_hive_table_literal
NiFi + Spark (can be site-to-site, command trigger, kafka)
https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html
Created 06-14-2017 02:23 PM
I confirmed this to be a bug in ConvertJSONToSQL, I have written up NIFI-4071, please see the Jira for details.