Reply
Highlighted
Explorer
Posts: 22
Registered: ‎03-04-2014
Accepted Solution

Morphlines - converting a byte array field to a UTF-8 string

I've got a record in Morphlines that includes a byte array field.  I want to convert that to a UTF-8 string, i.e. the equivalent of String(field, "UTF-8) in Java.

 

I can see the readClob command exists, but that works on a whole record rather than a single field.  Is there an alternative?

 

Appreciate that I should have stored my data differently, but it's not my format and that's what ETL tools (morphlines) are for! ;)

 

Thanks in advance.

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Morphlines - converting a byte array field to a UTF-8 string

You can try the java command, like so:

java {
code: """
String str = new String((byte[]) record.getFirstValue("myInputField"), "UTF-8");
record.put("myOutputField", str);
return getChild().process(record); // pass record to next command in chain
"""
}

Wolfgang.

Explorer
Posts: 22
Registered: ‎03-04-2014

Re: Morphlines - converting a byte array field to a UTF-8 string

Perfect, thanks for this.

 

I was also having a look at something like:

 

{ setValues { _attachment_body : "@{myInputField}" } }
{ readClob { charSet : "UTF-8" } }

 

Out of interest, are there any disadvantages with that approach?

 

Thanks.

 

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Morphlines - converting a byte array field to a UTF-8 string

That would work fine as well.

Wolfgang.

Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Re: Morphlines - converting a byte array field to a UTF-8 string

Hi, what kind of source you use to get attachment_body? Is it possible to use HttpSource as a source and morphlinesolrsink to process accepted payload (xml data)

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.