05-08-2018 01:32 AM
in my configuration file I need to manage a csv file with "UCS-2 LE BOM" encoding.
In input step I specified encoding = "UCS-2" but it's not recognized. Is "UCS-2" a valid encoding for envelope? How can I manage these files? Thank you very much for help
05-08-2018 07:10 AM
The CSV encoding option in Envelope gets passed down to Spark's CSV reader, and from looking at that Spark CSV code I can see that it is used in a Java String constructor. From the Java docs it looks like the valid encoding values will be those in the 'java.lang' column in here: https://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html
From your encoding name my best guess would be "UTF_32LE_BOM".
Currently incubating in Cloudera Labs:Envelope