JoltTransformJSON processor is used in our data pipeline.The Jolt Specification in the data pipeline contains two operations (shift and default). shift operation to translates Json fields from input message into database fields and default operation to read from flow file attribute to database field. The performance was good when we just had jolt_shift operation but the jolt_default operation decreases the performance. The Transform Cache Size is set 10000 but still we see the performance issue.
consumeKafka -> JoltTransformJSON -> putDatabaseRecord
Jolt specification
[{
"operation": "shift",
"spec": {
"studentName":"STUDENT_NAME",
"Age":"AGE",
"address_city":"CITY",
"address1":"ADDRESS1",
"zipcode":"POSTLCODE",
"id":"ID"
}
},{
"operation": "default",
"spec":{
"PRTN_NBR" : "${kafka.partition}"
}
}]
Input message
[{"studentName":"Foo2","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"103"}]
Please find attached summary of Total Task Duration and FlowFiles in 5 min. Any suggestions or any other alternatives? Thanks in advance.