<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Not able to fetch Twitter JSON data coming from Nifi to Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Not-able-to-fetch-Twitter-JSON-data-coming-from-Nifi-to-Hive/m-p/172251#M45902</link>
    <description>&lt;P&gt;Never mind, got this working. The problem was the double quote so I changed the replace string to &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;${twitter.msg:replaceAll('[$&amp;amp;+,:;=?@#|\'&amp;lt;&amp;gt;.^*()%!-]',''):replace('"',''):replace('\n','')}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;and it worked.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Nov 2016 04:06:09 GMT</pubDate>
    <dc:creator>mrizvi</dc:creator>
    <dc:date>2016-11-11T04:06:09Z</dc:date>
    <item>
      <title>Not able to fetch Twitter JSON data coming from Nifi to Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Not-able-to-fetch-Twitter-JSON-data-coming-from-Nifi-to-Hive/m-p/172250#M45901</link>
      <description>&lt;P&gt;Hi Guys,&lt;/P&gt;&lt;P&gt;I am trying to get the Twitter data from Nifi to Hive in JSON format using HiveJson Serde(referring this &lt;A href="https://github.com/rcongiu/Hive-JSON-Serde"&gt; https://github.com/rcongiu/Hive-JSON-Serde&lt;/A&gt;). I am using ReplaceText processor to parse the JSON data, one set of records per line which is the condition for this HiveJSON serde. My parsing string in the processor is this:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;{"tweet_id":${twitter.tweet_id},"created_unixtime":${twitter.unixtime},"created_time":"${twitter.time}","lang":"${twitter.language}","displayname":"${twitter.handle}","time_zone":"${twitter.time_zone}","msg":"${twitter.msg:replace(',''):replace('\n','')}"}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Sample of data in HDFS looks like this:&lt;/P&gt;&lt;PRE&gt;{"tweet_id":796782698631233536,"created_unixtime":1478802773517,"created_time":"Thu Nov 10 18:32:53 +0000 2016","lang":"it","displayname":"robinit66","time_zone":"Rome","msg":"RT @ubaldo_angelo: spiace davvero per tutti quelli che avevano intravisto nelle carotine e nei cavoletti di MIchelle le basi di una nuova c???"}
{"tweet_id":796782702829666304,"created_unixtime":1478802774518,"created_time":"Thu Nov 10 18:32:54 +0000 2016","lang":"it","displayname":"vespro4","time_zone":"Pacific Time (US &amp;amp; Canada)","msg":"Comprendo l'evasore fiscale https://t.co/Jn0s9KZEUz"}&lt;/PRE&gt;&lt;P&gt;Then I ran SELECT * from the table which is the external table in Hive, I got this error:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Expected a \u0027,\u0027 or \u0027}\u0027 at 273 [character 274 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt; I thought of replacing all special characters using replaceAll of Nifi expression language so I changed the replace string for the message as&lt;STRONG&gt; &lt;/STRONG&gt;&lt;STRONG&gt;${twitter.msg:replaceAll('[$&amp;amp;+,:;=?@#|\'&amp;lt;&amp;gt;.^*()%!-]',''):replace('\n','')}. &lt;/STRONG&gt;The data in HDFS looked much refined but that error was still there.&lt;/P&gt;&lt;P&gt;Then I hard coded the msg string with a bunch of special characters and text and it worked, please see this &lt;A href="https://community.cloudera.com/legacyfs/online/attachments/9315-sample-data-worked.png"&gt;sample-data-worked.png&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated. &lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2016 02:44:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Not-able-to-fetch-Twitter-JSON-data-coming-from-Nifi-to-Hive/m-p/172250#M45901</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-11-11T02:44:43Z</dc:date>
    </item>
    <item>
      <title>Re: Not able to fetch Twitter JSON data coming from Nifi to Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Not-able-to-fetch-Twitter-JSON-data-coming-from-Nifi-to-Hive/m-p/172251#M45902</link>
      <description>&lt;P&gt;Never mind, got this working. The problem was the double quote so I changed the replace string to &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;${twitter.msg:replaceAll('[$&amp;amp;+,:;=?@#|\'&amp;lt;&amp;gt;.^*()%!-]',''):replace('"',''):replace('\n','')}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;and it worked.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Nov 2016 04:06:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Not-able-to-fetch-Twitter-JSON-data-coming-from-Nifi-to-Hive/m-p/172251#M45902</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-11-11T04:06:09Z</dc:date>
    </item>
  </channel>
</rss>

