Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Morphlines - converting a byte array field to a UTF-8 string

Explorer

I've got a record in Morphlines that includes a byte array field.  I want to convert that to a UTF-8 string, i.e. the equivalent of String(field, "UTF-8) in Java.

 

I can see the readClob command exists, but that works on a whole record rather than a single field.  Is there an alternative?

 

Appreciate that I should have stored my data differently, but it's not my format and that's what ETL tools (morphlines) are for! 😉

 

Thanks in advance.

1 ACCEPTED SOLUTION

Expert Contributor
You can try the java command, like so:

java {
code: """
String str = new String((byte[]) record.getFirstValue("myInputField"), "UTF-8");
record.put("myOutputField", str);
return getChild().process(record); // pass record to next command in chain
"""
}

Wolfgang.

View solution in original post

5 REPLIES 5

Expert Contributor
You can try the java command, like so:

java {
code: """
String str = new String((byte[]) record.getFirstValue("myInputField"), "UTF-8");
record.put("myOutputField", str);
return getChild().process(record); // pass record to next command in chain
"""
}

Wolfgang.

Explorer

Perfect, thanks for this.

 

I was also having a look at something like:

 

{ setValues { _attachment_body : "@{myInputField}" } }
{ readClob { charSet : "UTF-8" } }

 

Out of interest, are there any disadvantages with that approach?

 

Thanks.

 

 

Expert Contributor
That would work fine as well.

Wolfgang.

Expert Contributor

Hi, what kind of source you use to get attachment_body? Is it possible to use HttpSource as a source and morphlinesolrsink to process accepted payload (xml data)

New Contributor

While reading parquet file, How to convert Parquet DECIMAL datatype to String.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.