Support Questions

Find answers, ask questions, and share your expertise

Not able to fetch Twitter JSON data coming from Nifi to Hive

avatar
Super Collaborator

Hi Guys,

I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this https://github.com/rcongiu/Hive-JSON-Serde). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this:

{"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"}

Sample of data in HDFS looks like this:

{"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"}
{"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US & Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"}

Then I ran SELECT * from the table which is the external table in Hive, I got this error:

{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat

I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('\n','')}. The data in HDFS looked much refined but that error was still there.

Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this sample-data-worked.png.

Any help would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Never mind, got this working. The problem was the double quote so I changed the replace string to

${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}

and it worked.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Never mind, got this working. The problem was the double quote so I changed the replace string to

${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}

and it worked.