About Okrasee

Okrasee · ‎10-27-2019

1. The JSON file is stored in a s3 bucket. If I insert data into the internal table and somehow accidentally drop the table, is the json data also dropped? 2. But the problem is even if I try to insert by hour (which is the smallest unit of chunk size), it runs into memory issue. I could only fetch top 900 rows within a one-hour window while there are on average 1B+ rows. I am running it on Tez and 'org.openx.data.jsonserde.JsonSerDe' is the serde I used to create the external table.

Okrasee · ‎10-27-2019

I have a super large json file and I have created an external hive table to read the file which is partitioned by date and hour. However, I ran into out-of-memory error when pulling out data for just one day. For now the output is stored in the text format. I learnt that I should convert it into ORC file. INSERT OVERWRITE TABLE table_name_orc SELECT * FROM table_name But in the code snippet above, SELECT * FROM table_name does not work because I could not extract everything out of the external table due to the memory issue. Any help would be appreciated!

Online	Offline
Last Visited	‎04-25-2020 12:45 AM

Member Since	‎10-27-2019 09:34 AM
Last Visited	‎04-25-2020 12:45 AM
Posts	4

Cloudera Community

Re: How to convert hive table format to ORC

How to convert hive table format to ORC