Support Questions

junnnninho · ‎04-21-2016

Hello there,

I am creating a table for storing json twitter data. I see different ways of using org.apache.hcatalog.data.JsonSerDe for it, but what would be the simpler process to use org.apache.hcatalog.data.JsonSerDe for this purpose.

Where should I get org.apache.hcatalog.data.JsonSerDe, and how to integrate it into my HIVE instance?

Thanks!

aervits · ‎04-21-2016

considering you have the following data

{  "user": {    "userlocation": "California, Santa Clara",    "id": 222222,    "name": "Hortonworks",    "screenname": "hortonworks",    "geoenabled": true  },  "tweetmessage": "Learn more about #Spark in #HDP 2.3 with @Hortonworks founder @acmurthy in this video overview http://bit.ly/1gOyr9w  #hadoop",  "createddate": "2015-07-24T16:30:33",  "geolocation": null}

your schema would look like so with JsonSerDe

CREATE EXTERNAL TABLE tweets (  createddate string,  geolocation string,  tweetmessage string,  `user` struct<geoenabled:boolean, id:int, name:string, screenname:string, userlocation:string>)
	ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/root/';

SELECT DISTINCT tweetmessage, user.name, createddate 
	FROM tweets WHERE user.name = 'Hortonworks'
		ORDER BY createddate;

View solution in original post

aervits · ‎04-21-2016

considering you have the following data

{  "user": {    "userlocation": "California, Santa Clara",    "id": 222222,    "name": "Hortonworks",    "screenname": "hortonworks",    "geoenabled": true  },  "tweetmessage": "Learn more about #Spark in #HDP 2.3 with @Hortonworks founder @acmurthy in this video overview http://bit.ly/1gOyr9w  #hadoop",  "createddate": "2015-07-24T16:30:33",  "geolocation": null}

your schema would look like so with JsonSerDe

CREATE EXTERNAL TABLE tweets (  createddate string,  geolocation string,  tweetmessage string,  `user` struct<geoenabled:boolean, id:int, name:string, screenname:string, userlocation:string>)
	ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/root/';

SELECT DISTINCT tweetmessage, user.name, createddate 
	FROM tweets WHERE user.name = 'Hortonworks'
		ORDER BY createddate;

junnnninho · ‎04-22-2016

Thanks Arten! quick question what does the folder '/user/root/' refers to? In my example I am specifying the /tmp/ folder where I store my twitter sample file.

CREATE TABLE TwitterExample_0( createddate string, geolocation string, tweetmessage string, `user` struct<geoenabled:boolean, id:int, name:string, screenname:string, userlocation:string>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/tmp/';

I am getting this error when I try a simple SELECT * FROM TwitterExample_0 LIMIT 10;

trace":"org.apache.ambari.view.hive.client.HiveErrorStatusException: H170 Unable to fetch results. java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=admin, access=READ_EXECUTE, inode=\"/tmp/ambari-qa\":ambari-qa:hdfs:drwx------\n\tat....

----------------------------------------------------

[hdfs@ip-172-31-34-25 ~]$ hadoop fs -chmod 777 /tmp/

[hdfs@ip-172-31-34-25 ~]$ hadoop fs -ls /tmp/

drwx------ - ambari-qa hdfs 0 2016-04-16 17:47 /tmp/ambari-qa

drwxr-xr-x - hdfs hdfs 0 2016-04-16 17:43 /tmp/entity-file-history

drwx-wx-wx - hive hdfs 0 2016-04-16 20:10 /tmp/hive

-rwxr-xr-x 3 hdfs hdfs 1902 2016-04-16 17:45 /tmp/id1fac3f21_date451616

-rwxr-xr-x 3 ambari-qa hdfs 1902 2016-04-16 17:50 /tmp/idtest.ambari-qa.1460843437.95.in

-rwxr-xr-x 3 ambari-qa hdfs 957 2016-04-16 17:50 /tmp/idtest.ambari-qa.1460843437.95.pig

-rwxrwxrwx 3 hdfs hdfs 2755 2016-04-21 20:35 /tmp/sample_twitter_data.txt

drwxr-xr-x - ambari-qa hdfs 0 2016-04-16 17:48 /tmp/tezsmokeinput

drwxr-xr-x - ambari-qa hdfs 0 2016-04-16 17:48 /tmp/tezsmokeoutput

Any help appreciated.

Thanks-

junnnninho · ‎04-22-2016

Hey Artem. I was able to get it working by doing:

CREATE TABLE TwitterTest( createddate string, geolocation string, tweetmessage string, `user` struct<geoenabled:boolean, id:int, name:string, screenname:string, userlocation:string>)

ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH '/tmp/sample_twitter_data.txt' OVERWRITE INTO TABLE TwitterTest;

However, when I try to do the query you did I get the following error:

Error while compiling statement: FAILED: ParseException line 1:39 cannot recognize input near 'user' '.' 'name' in selection target [ERROR_STATUS]

When I look at the table structure by doing a simple SELECT * FROM TwitterTest LIMIT 10; I see that all the fields inside the user struct are inside the same column (twittertest.user) ... As for the other fields (createddate, geolocation, twittermessage) they have their own columns.... Is that normal?

Thanks-

aervits · ‎04-22-2016

1. /user/root is my own user directory, you're doing it in /tmp and most likely you will have permissions issues, which you are. Make sure user running query has rwx on directory in schema

2. if you're copying my schema one to one, then you need one more thing as "user" is a keyword in Hive 1.2. My full examples are here https://github.com/dbist/workshops/tree/master/hive/JSON

run this before create schema and again before select on the table

SET hive.support.sql11.reserved.keywords=false;

3. That depends on your JSON, some columns are part of inner structures and some are on their own. It is normal.

junnnninho · ‎04-22-2016

Thanks Artem. Point 2 solved it! My JSON is identical to yours.

aervits · ‎04-22-2016

It's a workaround but in general I would avoid reserved words like 'user' in schema because you always have to set that property otherwise. Rename your data to 'usr'. If this solves your question, please accept the best answer to close this thread.

andrew_sears · ‎04-22-2016

You can use lateral view & explode or inline keyword to get at the data in the struct column.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

https://www.qubole.com/resources/cheatsheet/hive-function-cheat-sheet/

cheers,

Andrew

junnnninho · ‎04-22-2016

Thanks for the info, Andrew!

Cloudera Community

Support Questions

Creating a HIVE table with org.apache.hcatalog.data.JsonSerDe

Creating a HIVE table with structs inside structs ...

How to Create an Iceberg Table with PySpark in Clo...

Creating HIVE partitioned tables using sqoop

Creating HBase HFiles From a Hive Table

Hive Temporary Tables.

Creating Hive Table on S3A bucket + Ranger

Hive create external table fails to load data

How to Extract All Hive Tables DDL

How to create and connect to a dedicated Hive HS2 ...

Create Hive table using pyspark: Mkdirs failed to...