About subramp_

subramp_ · ‎10-16-2018

In our pipeline we are using PutDatabaseRecord to insert into database. we observed performance bottleneck in PutDatabaseRecord. The flow file is getting queued up onPutDatabaseRecord. Task/Time for PutDatabaseRecord is as follow 167,281 / 01:46:52.678. Total 45 connections configured for DBCP connection pool service. is it recommended to have MergContent before PutDatabaseRecord to batch flow file together? what will happe if any one of the flow file fails? will the whole batch is getting rolled back? Please let me know. Thanks Subbu

subramp_ · ‎10-08-2018

Can you upload or share the sample workflow for PutDatabaseRecord to insert the flow file content into database. Thanks Subbu

subramp_ · ‎10-06-2018

Thanks Matt for your reply. The whole flow file content have to be inserted into database. is there way to configure JsonTreeRecordReader to indicate whole flow file content as a value in prepared statement? Regards Subbu

subramp_ · ‎09-25-2018

I have data pipeline to consume message from Kafka and insert into oracle database. The message from Kafka is in JSON format. If any error occurs during the processing then insert whole message (flowfile content) to invalid payload table. The trimmed version of the pipeline is as follows GenerateFlowFile => ReplaceText => PutSQL. clob-insert-test.xml The table structure is as follows CREATE TABLE CLOB_TEST ("TRAN_ID" VARCHAR2(36 BYTE) NOT NULL ENABLE, "PAYLOAD" CLOB NOT NULL ENABLE, CONSTRAINT "CLOB_TEST" PRIMARY KEY ("TRAN_ID") ) Payload is more than 4000 characters. Creating insert SQL using replacetext failed and error is as follows Error report - SQL Error: ORA-01704: string literal too long 01704. 00000 -"string literal too long" *Cause:The string literal is longer than 4000 characters. *Action: Use a string literal of at most 4000 characters. Longer values may only be entered using bind variables. is there any other option to bind CLOB column and insert to the table. Any help will be greatly appreciated. Thanks. screen-shot-2018-09-24-at-111751-am.png

subramp_ · ‎08-24-2018

JoltTransformJSON processor is used in our data pipeline.The Jolt Specification in the data pipeline contains two operations (shift and default). shift operation to translates Json fields from input message into database fields and default operation to read from flow file attribute to database field. The performance was good when we just had jolt_shift operation but the jolt_default operation decreases the performance. The Transform Cache Size is set 10000 but still we see the performance issue. consumeKafka -> JoltTransformJSON -> putDatabaseRecord Jolt specification [{ "operation": "shift", "spec": { "studentName":"STUDENT_NAME", "Age":"AGE", "address_city":"CITY", "address1":"ADDRESS1", "zipcode":"POSTLCODE", "id":"ID" } },{ "operation": "default", "spec":{ "PRTN_NBR" : "${kafka.partition}" } }] Input message [{"studentName":"Foo2","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"103"}] Please find attached summary of Total Task Duration and FlowFiles in 5 min. Any suggestions or any other alternatives? Thanks in advance.

subramp_ · ‎08-24-2018

Thanks you very much for the solution.The solution worked like charm. I applied the DDL in postgres CREATE SEQUENCE id_seq START 101; The pipeline generated the sequence.nextvalue in the id column. [{"studentName":"Foo","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"101"},{"studentName":"Foo1","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"102"},{"studentName":"Foo2","Age":"12","address_city":"newyork","address1":"North avenue","zipcode":"123213","id":"103"}] Thanks again!!!

subramp_ · ‎08-22-2018

Thanks for the suggestion. I will try the solution in blog post and post my comments.

subramp_ · ‎08-22-2018

Thanks for the reply. The requirement for the data pipeline is to have guaranteed data delivery. The nextint() seems to be not guaranteed to be unique across a cluster which will result in unique constraint exception. The solution may not optimal for our use case. Thanks again for your suggestion.

subramp_ · ‎08-22-2018

I have data pipeline to consume message from Kafka and insert into oracle database consumeKafka -> JoltTransformJSON -> putDatabaseRecord The oracle table structure is as follows CREATE TABLE Persons ( ID NUMBER NOT NULL ENABLE, LastName varchar(255) NOT NULL, FirstName varchar(255), Age int, CONSTRAINT "Persons_PK" PRIMARY KEY ("ID") ); To insert a new record into the "Persons" table, we will have to use the nextval function.Json payload does not have the value for ID column. is there any option in putDatabaseRecord processor or any other processor to have seq_person.nextval in the insert statement?

Online	Offline
Last Visited	‎06-29-2020 04:20 PM

Member Since	‎08-22-2018 06:49 AM
Last Visited	‎06-29-2020 04:20 PM
Posts	12

Cloudera Community

PutDatabaseRecord Performance Issue

Re: Nifi Processor : Insert data into CLOB column

Re: Nifi Processor : Insert data into CLOB column

Nifi Processor : Insert data into CLOB column

JoltTransformJSON performance issue when using def...

Re: is there any option in Nifi to have sequence_n...

Re: is there any option in Nifi to have sequence_n...

Re: is there any option in Nifi to have sequence_n...

is there any option in Nifi to have sequence_name....