Support Questions

subramp_ · ‎08-24-2018

JoltTransformJSON processor is used in our data pipeline.The Jolt Specification in the data pipeline contains two operations (shift and default). shift operation to translates Json fields from input message into database fields and default operation to read from flow file attribute to database field. The performance was good when we just had jolt_shift operation but the jolt_default operation decreases the performance. The Transform Cache Size is set 10000 but still we see the performance issue.

consumeKafka -> JoltTransformJSON -> putDatabaseRecord

Jolt specification

[{

"operation": "shift",

"spec": {

"studentName":"STUDENT_NAME",

"Age":"AGE",

"address_city":"CITY",

"address1":"ADDRESS1",

"zipcode":"POSTLCODE",

"id":"ID"

}

},{

"operation": "default",

"spec":{

"PRTN_NBR" : "${kafka.partition}"

}

}]

Input message

[{"studentName":"Foo2","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"103"}]

Please find attached summary of Total Task Duration and FlowFiles in 5 min. Any suggestions or any other alternatives? Thanks in advance.

HarshR · ‎08-04-2020

Did you ever find a workaround/solution for this?

Cloudera Community

Support Questions

JoltTransformJSON performance issue when using default operation