New Contributor
Posts: 2
Registered: ‎05-02-2018
Accepted Solution

Envelope: csv with different encoding

Hi all,

in my configuration file I need to manage a csv file with "UCS-2 LE BOM" encoding.

In input step I specified encoding = "UCS-2" but it's not recognized. Is "UCS-2" a valid encoding for envelope? How can I manage these files? Thank you very much for help

Cloudera Employee
Posts: 50
Registered: ‎08-26-2015

Re: Envelope: csv with different encoding

Hi Matteo,


The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here:


From your encoding name my best guess would be "UTF_32LE_BOM".