Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

JoltTransformJSON performance issue when using default operation

Highlighted

JoltTransformJSON performance issue when using default operation

New Contributor

85730-flowfilesin5min.jpg

JoltTransformJSON processor is used in our data pipeline.The Jolt Specification in the data pipeline contains two operations (shift and default). shift operation to translates Json fields from input message into database fields and default operation to read from flow file attribute to database field. The performance was good when we just had jolt_shift operation but the jolt_default operation decreases the performance. The Transform Cache Size is set 10000 but still we see the performance issue.

consumeKafka -> JoltTransformJSON -> putDatabaseRecord

Jolt specification

[{

"operation": "shift",

"spec": {

"studentName":"STUDENT_NAME",

"Age":"AGE",

"address_city":"CITY",

"address1":"ADDRESS1",

"zipcode":"POSTLCODE",

"id":"ID"

}

},{

"operation": "default",

"spec":{

"PRTN_NBR" : "${kafka.partition}"

}

}]

Input message

[{"studentName":"Foo2","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"103"}]

Please find attached summary of Total Task Duration and FlowFiles in 5 min. Any suggestions or any other alternatives? Thanks in advance.

85731-totaltaskdurationin5min.jpg

Don't have an account?
Coming from Hortonworks? Activate your account here