Created 11-22-2017 02:23 PM
I'm trying to write data to Kudu using PutKudu processor. PutKudu processor's log looks like this:
2017-11-22 15:17:02,416 INFO [NiFi Web Server-2783] o.a.n.c.s.StandardProcessScheduler Starting PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] 2017-11-22 15:17:02,420 DEBUG [StandardProcessScheduler Thread-2] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] Setting up Kudu connection... 2017-11-22 15:17:02,663 DEBUG [StandardProcessScheduler Thread-2] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] Kudu connection successfully initialized 2017-11-22 15:17:02,664 INFO [StandardProcessScheduler Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] to run with 1 threads 2017-11-22 15:17:02,670 INFO [Timer-Driven Process Thread-10] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1 2017-11-22 15:17:02,674 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1 2017-11-22 15:17:02,676 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1 2017-11-22 15:17:02,678 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1 2017-11-22 15:17:02,681 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1
But no data in Kudu and no errors in Kudu logs. Kudu is working great via Impala. Any ideas?
Thanks
Created 11-22-2017 07:05 PM
Hi @Dmitry K,
Can you share your flow and details around your PutKudu configuration? I'm most curious about what processor you have before the PutKudu processor to understand the data coming in.
Not sure if it will be helpful, but I wrote an HCC article about getting MySQL data into Kudu:
Thanks!
Created on 11-23-2017 08:37 AM - edited 08-17-2019 09:30 PM
First of all, your article is great! It was very useful on my first steps with NiFi and Kudu. Thanks!
My dataflow is:
I get data from REST API (for now it stopped, so you can see red square, but it works fine when i get data from it), then add needed attributes to dataflow (like timestamp, API's title), then split JSON array into JSON objests (no nested objects there, "key":"value" pairs only), then remove several fields via JOLT transformation (with symbols like '@'), then add dataflow attributes to JSON. For now i have flow contains JSON oblects with several "key":"value" data into each object. Keys like "field", "field1" or "field_name1". Values are numbers, strings(GUIDs or dates in different formats) or nulls. I checked JSON by stopping PutKudu and view data into queue.
My PutKudu configuration is:
I tried to use several Flush Mode values (AUTO_FLUSH_SYNC, AUTO_FLUSH_BACKGROUND), different Batch size values (2, 10, 100, 1000, 10000) and different Record Readers (AvroReader, several JsonPathReaders).
For now I extract 1 field from JSON object via JsonPathReader and store it into Kudu table with 1 column. JsonPathReader configuration is:
I checked that key "fieldid" exists in JSON and its value is GUID string. Table "impala::default.test_activities" with 1 string column "json_str" created via Impala. I added 1 test row to it. Also i checked existing of table with "kudu table list localhost:7051" command on kudu_master.
AvroSchemaRegistry configuration is:
and "All" schema is
{ "type": "record", "name": "All", "fields": [ { "name" : "json_str", "type" : "string" }] }
Also i tried to use schema like this:
{ "type": "record", "name": "All", "fields": [ { "name" : "json_str", "type" : ["string","null"] }] }
That what i have for now.
Thanks.
Created 10-16-2018 04:22 PM
Hi @Dmitry K
Just curious to know if you were able to figure out what is happening with PutKudu Processor. I'm facing a similar issue where my NiFi Processor runs fine, the logs look clean and no errors in Kudu logs but there is no record posted to Kudu table. Logs given below.
Also just curious to know if you were able to handle Kudu Updates from NiFi.
2018-10-16 12:02:19,639 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository 2018-10-16 12:02:19,639 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 737 records in 0 milliseconds 2018-10-16 12:03:03,147 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false 2018-10-16 12:03:04,705 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false 2018-10-16 12:03:08,029 INFO [Provenance Query-1] o.a.nifi.provenance.StandardQueryResult Completed Query[ [01661009-bea9-1475-b35c-86134aa20ef6] ] comprised of 1 steps in 74 millis 2018-10-16 12:03:08,029 INFO [Provenance Query-1] o.a.n.provenance.index.lucene.QueryTask Successfully queried index /u01/nifi/data/provenance_repository/index-1532552783325 for query +processorId:01661009-bea9-1475-b35c-86134aa20ef6; retrieved 35 events with a total of 61 hits in 47 millis 2018-10-16 12:03:18,681 INFO [NiFi Web Server-307] o.a.n.c.s.StandardProcessScheduler Starting PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] 2018-10-16 12:03:18,735 INFO [Timer-Driven Process Thread-9] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] to run with 1 threads 2018-10-16 12:03:18,787 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false 2018-10-16 12:03:19,271 INFO [NiFi Web Server-317] o.a.n.c.s.StandardProcessScheduler Stopping PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] 2018-10-16 12:03:19,271 INFO [NiFi Web Server-317] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.kudu.PutKudu 2018-10-16 12:03:19,274 INFO [Timer-Driven Process Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] to run 2018-10-16 12:03:19,351 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false
Thanks
Created on 11-01-2022 04:52 AM - edited 11-01-2022 04:53 AM
Hi! Anyone fixed this problem? I'm facing a similiar problem. PutKudu (or any kudu client script) when writing millions os rows, don't write some rows. No log entries and no errors in Nifi.
Nifi: 1.15.1
Kudu: 1.9.1-cdh6.2.1
Thanks in advance.