<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Querying JSON data using Hive in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97414#M60630</link>
    <description>&lt;P&gt;JSON Serde works on 1 line at a time, with each line being parsed independently.

Is your JSON encoded to fit into 1 line for each record in your stream?&lt;/P&gt;</description>
    <pubDate>Fri, 20 Nov 2015 04:20:38 GMT</pubDate>
    <dc:creator>gopalv</dc:creator>
    <dc:date>2015-11-20T04:20:38Z</dc:date>
    <item>
      <title>Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97413#M60629</link>
      <description>&lt;P&gt;I am using &lt;A href="https://github.com/rcongiu/Hive-JSON-Serde" target="_blank"&gt;https://github.com/rcongiu/Hive-JSON-Serde&lt;/A&gt; to query the JSON data via hive.&lt;/P&gt;&lt;P&gt;As part of testing, I am using an external table to query the JSON plain text file in HDFS.&lt;/P&gt;&lt;P&gt;i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE  limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 04:19:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97413#M60629</guid>
      <dc:creator>Jagatheeshr</dc:creator>
      <dc:date>2015-11-20T04:19:21Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97414#M60630</link>
      <description>&lt;P&gt;JSON Serde works on 1 line at a time, with each line being parsed independently.

Is your JSON encoded to fit into 1 line for each record in your stream?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 04:20:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97414#M60630</guid>
      <dc:creator>gopalv</dc:creator>
      <dc:date>2015-11-20T04:20:38Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97415#M60631</link>
      <description>Maybe use a different SERDE?

&lt;OL&gt;&lt;LI&gt;ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' or&lt;/LI&gt;&lt;LI&gt;ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' from 
&lt;A href="https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master" target="_blank"&gt;https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master&lt;/A&gt; &lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Fri, 20 Nov 2015 04:27:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97415#M60631</guid>
      <dc:creator>amcbarnett</dc:creator>
      <dc:date>2015-11-20T04:27:56Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97416#M60632</link>
      <description>&lt;P&gt;load the JSON as string with column name "json"&lt;/P&gt;&lt;PRE&gt;SELECT get_json_object(json, '$.id') AS ID,get_json_object(json, '$.person.last_name') AS LastName,get_json_object(json, '$.person.first_name') AS FirstName,get_json_object(json, '$.person.email') AS email,get_json_object(json, '$.person.location.address') AS Address,get_json_object(json, '$.person.location.city') AS City,get_json_object(json, '$.person.location.state') AS State,get_json_object(json, '$.person.location.zipcode') AS Zip,get_json_object(json, '$.person.text') AS Text,get_json_object(json, '$.person.url') AS URLFROM HBASE_JSON_TABLE;ORSELECT id, lastName, firstName, email, city, state, text, url FROM hbase_json_table A LATERAL VIEW json_tuple(A.json, 'id', 'person') B AS id, personLATERAL VIEW json_tuple(person, 'last_name', 'first_name', 'email', 'text', 'url', 'location') C as lastName, firstName, email, text, url, locLATERAL VIEW json_tuple(loc, 'city', 'state') D AS city, state;&lt;/PRE&gt;&lt;P&gt;OR&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE json_serde_table (
  id string,
  person struct&amp;lt;email:string, first_name:string, last_name:string, location:struct&amp;lt;address:string, city:string, state:string, zipcode:string&amp;gt;, text:string, url:string&amp;gt;)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/json/';
SELECT id, person.first_name, person.last_name, person.email, 
person.location.address, person.location.city, person.location.state, 
person.location.zipcode, person.text, person.url
FROM json_serde_table LIMIT 5;&lt;/PRE&gt;</description>
      <pubDate>Fri, 20 Nov 2015 04:55:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97416#M60632</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-11-20T04:55:57Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97417#M60633</link>
      <description>&lt;P&gt;&lt;A href="http://community.hortonworks.com/users/393/aervits.html"&gt;@Artem Ervits&lt;/A&gt;. Have used the last create external statement .  If i do select * from json_serde_table ; i get a JSON output.but not in a valid format.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 19:58:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97417#M60633</guid>
      <dc:creator>Jagatheeshr</dc:creator>
      <dc:date>2015-11-20T19:58:53Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97418#M60634</link>
      <description>&lt;P&gt;@&lt;A href="http://community.hortonworks.com/users/369/amcbarnett.html"&gt;Ancil McBarnett&lt;/A&gt;, i am using the 2nd one here. Which is quite popular and well tested as it appers&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 19:59:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97418#M60634</guid>
      <dc:creator>Jagatheeshr</dc:creator>
      <dc:date>2015-11-20T19:59:30Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97419#M60635</link>
      <description>&lt;P&gt;&lt;A href="http://community.hortonworks.com/users/301/gopal.html"&gt;@gopal&lt;/A&gt;, Yes. The input JSON is all in single line.&lt;/P&gt;&lt;P&gt;Enabled debugging, to see if the select * on hive console is using a different Serde than the one defined in the Table. however it looks thats not the case,&lt;/P&gt;&lt;P&gt;Enabled the debug option in hive cli
&lt;/P&gt;&lt;P&gt;serialization.lib=org.openx.data.jsonserde.JsonSerDe, &lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 20:06:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97419#M60635</guid>
      <dc:creator>Jagatheeshr</dc:creator>
      <dc:date>2015-11-20T20:06:07Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97420#M60636</link>
      <description>&lt;P&gt;can you paste the output&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2015 21:39:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97420#M60636</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2015-11-20T21:39:15Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97421#M60637</link>
      <description>&lt;P&gt;What if my Json record is spread across multiple lines?&lt;/P&gt;</description>
      <pubDate>Mon, 22 Aug 2016 12:16:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97421#M60637</guid>
      <dc:creator>sreecharan_akir</dc:creator>
      <dc:date>2016-08-22T12:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97422#M60638</link>
      <description>&lt;P&gt;hi Artem &lt;/P&gt;&lt;P&gt;I used your method but I am getting error can you help please?&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT&amp;lt; 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING&amp;gt;
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive&amp;gt;
    &amp;gt;
    &amp;gt; select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive&amp;gt;



&lt;/PRE&gt;</description>
      <pubDate>Tue, 25 Oct 2016 00:05:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97422#M60638</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-25T00:05:20Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97423#M60639</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/10115/sahmad43.html" nodeid="10115"&gt;@Sami Ahmad&lt;/A&gt;&lt;P&gt;what is your sample data?&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2016 00:09:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97423#M60639</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-10-25T00:09:27Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97424#M60640</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/8843-events1476284674520.zip"&gt;events1476284674520.zip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;iam attaching the tweeter file that was created using flume . can you please see if its of valid structure as I am unable to read/view this file .&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2016 00:16:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97424#M60640</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-25T00:16:21Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97425#M60641</link>
      <description>&lt;P&gt;and this is how I generate these twitter files (based on internet demos) &lt;/P&gt;&lt;PRE&gt; flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1  -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file


&lt;/PRE&gt;</description>
      <pubDate>Tue, 25 Oct 2016 00:19:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97425#M60641</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-25T00:19:00Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97426#M60642</link>
      <description>&lt;P&gt;Artem any advise? I am anxiously waiting for yours or anyone feedback&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2016 20:45:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97426#M60642</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-25T20:45:55Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97427#M60643</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10115/sahmad43.html" nodeid="10115"&gt;@Sami Ahmad&lt;/A&gt; if you can't view or read the file means this is not a valid text file, hence the problems you're facing. From your output, looks like you have some binary structures in your file and therefore you're having difficulty setting Hive schema on top of it. Please review your HDFS sink properties. Also, I highly recommend investing in Apache Nifi, all these problems would go away quickly and since this is a closed thread, please open a new question instead of here. This makes it difficult to understand the context of this thread. Again, once you can view your resultant twitter file output, you should be able to apply Hive schema on it.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2016 23:01:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97427#M60643</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-10-25T23:01:12Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97428#M60644</link>
      <description>&lt;P&gt;I already have another thread open but not getting much responses there . can you please follow up on that thread ?  &lt;/P&gt;&lt;P&gt;also I attached my output file and how I generate it in this thread and the other , can you help me identify why the file is bad ?&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/61181/reading-json-files.html#comment-63383"&gt;https://community.hortonworks.com/questions/61181/reading-json-files.html#comment-63383&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 00:49:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97428#M60644</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T00:49:29Z</dc:date>
    </item>
    <item>
      <title>Re: Querying JSON data using Hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97429#M60645</link>
      <description>&lt;P&gt;also created a new one just for finding out how to create the twitter json file from flume &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/63419/bad-hdfs-sink-property.html "&gt;https://community.hortonworks.com/questions/63419/bad-hdfs-sink-property.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 01:19:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Querying-JSON-data-using-Hive/m-p/97429#M60645</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T01:19:59Z</dc:date>
    </item>
  </channel>
</rss>

