Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎08-01-2017

Hive: CSV (with arrays) to JSON

I have a table on hive wich I've downloaded to Pandas. On this table i've edited a complete column and now i wish to put it back onto hive. The problem is that this table has some arrays and therefore I can't use OpenCSV wich converts all columns to string.

Here's an exemple of a row:

956527303395246080,1,Thu Jan 25 14:00:55 +0000 2018,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>",False,,"{""urls"":[],""user_mentions"":[{""screen_name"":""librofm"",""name"":""Libro.fm""}],""hashtags"":[{""text"":""FireAndFury""}]}",en,0,"In an attack on my mental health, I’m listening to #FireAndFury via @librofm","{""screen_name"":""maryruthless"",""name"":""✨Vincent ✨"",""friends_count"":680,""followers_count"":226,""statuses_count"":3981,""verified"":false,""utc_offset"":-18000,""time_zone"":""Eastern Time (US & Canada)""}",2018012515

And the hive table:

CREATE EXTERNAL TABLE test (id bigint,sentiment INT,created_at string,source STRING,favorited BOOLEAN,retweeted_status STRUCT<text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>,lang string,retweet_count int,text string,user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING>
   ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE

I've tought in converting it to json or xml... is this a good idea? Can anyone please help?

Many thanks in advance.

Announcements