Reply
New Contributor
Posts: 2
Registered: ‎05-02-2018
Accepted Solution

Envelope: csv with different encoding

Hi all,

in my configuration file I need to manage a csv file with "UCS-2 LE BOM" encoding.

In input step I specified encoding = "UCS-2" but it's not recognized. Is "UCS-2" a valid encoding for envelope? How can I manage these files? Thank you very much for help

Cloudera Employee
Posts: 26
Registered: ‎08-26-2015

Re: Envelope: csv with different encoding

Hi Matteo,

 

The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

 

From your encoding name my best guess would be "UTF_32LE_BOM".

 

Jeremy

Announcements

Currently incubating in Cloudera Labs:

Envelope
HTrace
Ibis
Impyla
Livy
Oryx
Phoenix
Spark Runner for Beam SDK
Time Series for Spark
YCSB