Cloudera Labs
Provide feedback on Cloudera Labs
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Envelope: csv with different encoding

SOLVED Go to solution
Highlighted

Envelope: csv with different encoding

New Contributor

Hi all,

in my configuration file I need to manage a csv file with "UCS-2 LE BOM" encoding.

In input step I specified encoding = "UCS-2" but it's not recognized. Is "UCS-2" a valid encoding for envelope? How can I manage these files? Thank you very much for help

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Envelope: csv with different encoding

Rising Star

Hi Matteo,

 

The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

 

From your encoding name my best guess would be "UTF_32LE_BOM".

 

Jeremy

1 REPLY 1

Re: Envelope: csv with different encoding

Rising Star

Hi Matteo,

 

The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

 

From your encoding name my best guess would be "UTF_32LE_BOM".

 

Jeremy