Support Questions
Find answers, ask questions, and share your expertise

Hi, Is there a way to load xlsx file into hive table?

Solved Go to solution

Hi, Is there a way to load xlsx file into hive table?

Explorer
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hi, Is there a way to load xlsx file into hive table?

Mentor
6 REPLIES 6

Re: Hi, Is there a way to load xlsx file into hive table?

its better to export it as csv or any delimited format and load it into hive table.

Re: Hi, Is there a way to load xlsx file into hive table?

@AKILA VEL

I don't think we have any direct method to do that, however there are few workaround to achieve this

One way is to write a custom java mapreduce job to convert xls to csv or create your own custom serd to access xls.

Re: Hi, Is there a way to load xlsx file into hive table?

Not directly I am afraid. You can write a MapReduce job that transforms them into normal delimited data. Similar to the way it was done with Tika here. ( Assuming you have lots of small files )

https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html

You would however need to use a Java library like POI instead of Tika

https://poi.apache.org/

To read it directly in Hive you need to write an HiveInputFormat. You can use this inputformat class as an example:

https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/

If you return a row for each record that is delimited and pretend to the Hive Serde that its a text inputformat you might be able to get it working.

Re: Hi, Is there a way to load xlsx file into hive table?

Rising Star

There are multiple options:

1. You can use apache tika (using a programming language like Java) to read the xlxs and load into hive.

2. If its a single xls sheet, then you can use pig's CSVExcelStorage() and insert into hive table using HCatStorer()

3. Convert to a delimited CSV and load it.

Re: Hi, Is there a way to load xlsx file into hive table?

Explorer

Mine is xlsx files with single sheets. Can you please explain how to use pig's CSVExcelStorage() and insert into hive table using HCatStorer().

Re: Hi, Is there a way to load xlsx file into hive table?

Mentor