Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Convert XLSX file to CSV file using PIG?

How to Convert XLSX file to CSV file using PIG?

Expert Contributor
 
1 REPLY 1

Re: How to Convert XLSX file to CSV file using PIG?

Guru

Bottom line is if the xlsx is a single tab (sheet) in the spreadsheet, you can use the piggybank function CSVExcelStorage to load the spreadsheet as below

REGISTER <pathTo>/piggybank.jar 

rawdata = load 'myData.xlsx' using org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'NOCHANGE', 'SKIP_INPUT_HEADER') as (col1,col2,..);

If your xlsx is multiple tabs (sheets), you can separate each sheet into separate xlsx files and use piggybank as above for each resulting file.

See also: https://community.hortonworks.com/questions/31968/hi-is-there-a-way-to-load-xlsx-file-into-hive-tabl...