Member since
12-18-2017
2
Posts
1
Kudos Received
0
Solutions
04-17-2018
02:51 AM
1 Kudo
I have managed to solve my problem, it was a silly little mistake that I was making. I created JSON table using: ADD JAR hdfs://hwmaster01.com/user/root/hive-serdes-1.0-SNAPSHOT.jar;
CREATE TABLE tweets_pqt (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' After putting that above mentioned JAR file in Cloudera Manager's "Hive Auxiliary JARs Directory" (on un-managed machine you can find that under "hive.aux.jars.path" property in hive--site.xml) Then I created the Parquet table with same structure as above with a little change: ADD JAR hdfs://hwmaster01.com/user/root/hive-serdes-1.0-SNAPSHOT.jar;
CREATE TABLE tweets_pqt (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
--ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS PARQUET; I inserted into parquet table succesfully the moment I commented that line.
... View more