Created 11-10-2016 06:44 PM
Hi Guys,
I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this https://github.com/rcongiu/Hive-JSON-Serde). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this:
{"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"}
Sample of data in HDFS looks like this:
{"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"} {"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US & Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"}
Then I ran SELECT * from the table which is the external table in Hive, I got this error:
{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat
I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as ${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('\n','')}. The data in HDFS looked much refined but that error was still there.
Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this sample-data-worked.png.
Any help would be greatly appreciated.
Created 11-10-2016 08:06 PM
Never mind, got this working. The problem was the double quote so I changed the replace string to
${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}
and it worked.
Created 11-10-2016 08:06 PM
Never mind, got this working. The problem was the double quote so I changed the replace string to
${twitter.msg:replaceAll('[$&+,:;=?@#|\'<>.^*()%!-]',''):replace('"',''):replace('\n','')}
and it worked.