Created 05-06-2016 04:18 PM
Created 05-06-2016 06:06 PM
here's an example of CSVExcelStorage and then you can execute sql commands in pig using
Created 05-06-2016 04:21 PM
its better to export it as csv or any delimited format and load it into hive table.
Created 05-06-2016 04:21 PM
I don't think we have any direct method to do that, however there are few workaround to achieve this
One way is to write a custom java mapreduce job to convert xls to csv or create your own custom serd to access xls.
Created 05-06-2016 04:26 PM
Not directly I am afraid. You can write a MapReduce job that transforms them into normal delimited data. Similar to the way it was done with Tika here. ( Assuming you have lots of small files )
You would however need to use a Java library like POI instead of Tika
To read it directly in Hive you need to write an HiveInputFormat. You can use this inputformat class as an example:
If you return a row for each record that is delimited and pretend to the Hive Serde that its a text inputformat you might be able to get it working.
Created 05-06-2016 04:26 PM
There are multiple options:
1. You can use apache tika (using a programming language like Java) to read the xlxs and load into hive.
2. If its a single xls sheet, then you can use pig's CSVExcelStorage() and insert into hive table using HCatStorer()
3. Convert to a delimited CSV and load it.
Created 05-06-2016 05:58 PM
Mine is xlsx files with single sheets. Can you please explain how to use pig's CSVExcelStorage() and insert into hive table using HCatStorer().
Created 05-06-2016 06:06 PM
here's an example of CSVExcelStorage and then you can execute sql commands in pig using