Support Questions

Find answers, ask questions, and share your expertise

Hi, Is there a way to load xlsx file into hive table?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Master Mentor
6 REPLIES 6

avatar
Super Guru

its better to export it as csv or any delimited format and load it into hive table.

avatar
Super Guru

@AKILA VEL

I don't think we have any direct method to do that, however there are few workaround to achieve this

One way is to write a custom java mapreduce job to convert xls to csv or create your own custom serd to access xls.

avatar
Master Guru

Not directly I am afraid. You can write a MapReduce job that transforms them into normal delimited data. Similar to the way it was done with Tika here. ( Assuming you have lots of small files )

https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html

You would however need to use a Java library like POI instead of Tika

https://poi.apache.org/

To read it directly in Hive you need to write an HiveInputFormat. You can use this inputformat class as an example:

https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/

If you return a row for each record that is delimited and pretend to the Hive Serde that its a text inputformat you might be able to get it working.

avatar
Expert Contributor

There are multiple options:

1. You can use apache tika (using a programming language like Java) to read the xlxs and load into hive.

2. If its a single xls sheet, then you can use pig's CSVExcelStorage() and insert into hive table using HCatStorer()

3. Convert to a delimited CSV and load it.

avatar
Contributor

Mine is xlsx files with single sheets. Can you please explain how to use pig's CSVExcelStorage() and insert into hive table using HCatStorer().

avatar
Master Mentor