Created 12-04-2015 12:58 PM
One of my client is trying to create an external Hive table in HDP from CSV files, (about 30 files, total of 2.5 TeraBytes)
But the files are formatted as: “Little-endian, UTF-16 Unicode text, with CRLF, CR line terminators”. Here are couple of issues
Is there an easy way to convert CSV/TXT files from Unicode (UTF-16 / UCS-2) to ASCII (UTF-8)?
Is there is a way for Hive to recognize this format?
He tried to use iconv to convert the utf-16 format to ascii format but it but it fails when source file is more than 15 GB file.
iconv -c -f utf-16 -t us-ascii
Any suggestions??
Created 12-04-2015 03:40 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 12-04-2015 03:40 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 06-29-2016 07:01 PM
I used NiFi's ConvertCharacterSet to change from UTF-16LE to UTF-8, it's a great and straightforward option if you're using it 🙂
Created 07-27-2016 04:11 PM
Hi, where i can find the character set values that are accepted by ConvertCharacterSet processor?
Also what component can i use to load CSV file and to dump results into the converted CSV file?
Created 07-28-2016 07:20 AM
So i found appropriate components but it doesnt convert the file properly, any idea? input file is a binary