Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

Not able to fetch Twitter JSON data coming from Nifi to Hive

avatar
Super Collaborator

Hi Guys,

I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this https://github.com/rcongiu/Hive-JSON-Serde). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this:

{"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"}

Sample of data in HDFS looks like this:

{"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"}
{"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US & Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"}

Then I ran SELECT * from the table which is the external table in Hive, I got this error:

{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat

I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('\n','')}. The data in HDFS looked much refined but that error was still there.

Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this sample-data-worked.png.

Any help would be greatly appreciated.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Never mind, got this working. The problem was the double quote so I changed the replace string to

${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}

and it worked.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Never mind, got this working. The problem was the double quote so I changed the replace string to

${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}

and it worked.