Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive: CSV (with arrays) to JSON

Hive: CSV (with arrays) to JSON


I have a table on hive wich I've downloaded to Pandas. On this table i've edited a complete column and now i wish to put it back onto hive. The problem is that this table has some arrays and therefore I can't use OpenCSV wich converts all columns to string.

Here's an exemple of a row:

956527303395246080,1,Thu Jan 25 14:00:55 +0000 2018,"<a href="""" rel=""nofollow"">Twitter for iPhone</a>",False,,"{""urls"":[],""user_mentions"":[{""screen_name"":""librofm"",""name"":""""}],""hashtags"":[{""text"":""FireAndFury""}]}",en,0,"In an attack on my mental health, I’m listening to #FireAndFury via @librofm","{""screen_name"":""maryruthless"",""name"":""✨Vincent ✨"",""friends_count"":680,""followers_count"":226,""statuses_count"":3981,""verified"":false,""utc_offset"":-18000,""time_zone"":""Eastern Time (US & Canada)""}",2018012515

And the hive table:

CREATE EXTERNAL TABLE test (id bigint,sentiment INT,created_at string,source STRING,favorited BOOLEAN,retweeted_status STRUCT<text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>,lang string,retweet_count int,text string,user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING>

I've tought in converting it to json or xml... is this a good idea? Can anyone please help?

Many thanks in advance.


Re: Hive: CSV (with arrays) to JSON


either way json or xml i couldnt find any difference among them . you should be fine which ever works for you. 

Re: Hive: CSV (with arrays) to JSON




many thanks for your answer.

The problem is that I cannot convert the csv file with structs to a proper json using df.to_json


Can you please help?
Many thanks in advance.

Kind regards

Don't have an account?
Coming from Hortonworks? Activate your account here