Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Envelope: csv with different encoding

Solved Go to solution

Envelope: csv with different encoding

New Contributor

Hi all,

in my configuration file I need to manage a csv file with "UCS-2 LE BOM" encoding.

In input step I specified encoding = "UCS-2" but it's not recognized. Is "UCS-2" a valid encoding for envelope? How can I manage these files? Thank you very much for help

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Envelope: csv with different encoding

Rising Star

Hi Matteo,

 

The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

 

From your encoding name my best guess would be "UTF_32LE_BOM".

 

Jeremy

1 REPLY 1

Re: Envelope: csv with different encoding

Rising Star

Hi Matteo,

 

The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

 

From your encoding name my best guess would be "UTF_32LE_BOM".

 

Jeremy