Support Questions

Find answers, ask questions, and share your expertise

PutKudu processor doesn't throw any exceptions and doesn't write data to Kudu

avatar
New Contributor

I'm trying to write data to Kudu using PutKudu processor. PutKudu processor's log looks like this:

2017-11-22 15:17:02,416 INFO [NiFi Web Server-2783] o.a.n.c.s.StandardProcessScheduler Starting PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515]
2017-11-22 15:17:02,420 DEBUG [StandardProcessScheduler Thread-2] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] Setting up Kudu connection...
2017-11-22 15:17:02,663 DEBUG [StandardProcessScheduler Thread-2] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] Kudu connection successfully initialized
2017-11-22 15:17:02,664 INFO [StandardProcessScheduler Thread-2] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] to run with 1 threads
2017-11-22 15:17:02,670 INFO [Timer-Driven Process Thread-10] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1
2017-11-22 15:17:02,674 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1
2017-11-22 15:17:02,676 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1
2017-11-22 15:17:02,678 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1
2017-11-22 15:17:02,681 INFO [Timer-Driven Process Thread-7] org.apache.nifi.processors.kudu.PutKudu PutKudu[id=ca52eca0-015f-1000-c66f-d789248d6515] KUDU: number of inserted records: 1

But no data in Kudu and no errors in Kudu logs. Kudu is working great via Impala. Any ideas?

Thanks

4 REPLIES 4

avatar
Guru

Hi @Dmitry K,

Can you share your flow and details around your PutKudu configuration? I'm most curious about what processor you have before the PutKudu processor to understand the data coming in.

Not sure if it will be helpful, but I wrote an HCC article about getting MySQL data into Kudu:

https://community.hortonworks.com/articles/144009/using-the-putkudu-processor-to-ingest-mysql-data-i...

Thanks!

avatar
New Contributor

First of all, your article is great! It was very useful on my first steps with NiFi and Kudu. Thanks!

My dataflow is:

43675-dataflow.png

I get data from REST API (for now it stopped, so you can see red square, but it works fine when i get data from it), then add needed attributes to dataflow (like timestamp, API's title), then split JSON array into JSON objests (no nested objects there, "key":"value" pairs only), then remove several fields via JOLT transformation (with symbols like '@'), then add dataflow attributes to JSON. For now i have flow contains JSON oblects with several "key":"value" data into each object. Keys like "field", "field1" or "field_name1". Values are numbers, strings(GUIDs or dates in different formats) or nulls. I checked JSON by stopping PutKudu and view data into queue.

My PutKudu configuration is:

43679-putkudu-configuration.png

I tried to use several Flush Mode values (AUTO_FLUSH_SYNC, AUTO_FLUSH_BACKGROUND), different Batch size values (2, 10, 100, 1000, 10000) and different Record Readers (AvroReader, several JsonPathReaders).

For now I extract 1 field from JSON object via JsonPathReader and store it into Kudu table with 1 column. JsonPathReader configuration is:

43682-jsonpathreader-configuration.png

I checked that key "fieldid" exists in JSON and its value is GUID string. Table "impala::default.test_activities" with 1 string column "json_str" created via Impala. I added 1 test row to it. Also i checked existing of table with "kudu table list localhost:7051" command on kudu_master.

AvroSchemaRegistry configuration is:

43683-avroschemaregistry-configuration.png

and "All" schema is

{
  "type": "record",
  "name": "All",
  "fields": [
  {
    "name" : "json_str",
    "type" : "string"
  }]
}

Also i tried to use schema like this:

{
  "type": "record",
  "name": "All",
  "fields": [
  {
    "name" : "json_str",
    "type" : ["string","null"]
  }]
}

That what i have for now.

Thanks.

avatar
New Contributor

Hi @Dmitry K

Just curious to know if you were able to figure out what is happening with PutKudu Processor. I'm facing a similar issue where my NiFi Processor runs fine, the logs look clean and no errors in Kudu logs but there is no record posted to Kudu table. Logs given below.

Also just curious to know if you were able to handle Kudu Updates from NiFi.

2018-10-16 12:02:19,639 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-10-16 12:02:19,639 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 737 records in 0 milliseconds
2018-10-16 12:03:03,147 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false
2018-10-16 12:03:04,705 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false
2018-10-16 12:03:08,029 INFO [Provenance Query-1] o.a.nifi.provenance.StandardQueryResult Completed Query[ [01661009-bea9-1475-b35c-86134aa20ef6] ] comprised of 1 steps in 74 millis
2018-10-16 12:03:08,029 INFO [Provenance Query-1] o.a.n.provenance.index.lucene.QueryTask Successfully queried index /u01/nifi/data/provenance_repository/index-1532552783325 for query +processorId:01661009-bea9-1475-b35c-86134aa20ef6; retrieved 35 events with a total of 61 hits in 47 millis
2018-10-16 12:03:18,681 INFO [NiFi Web Server-307] o.a.n.c.s.StandardProcessScheduler Starting PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6]
2018-10-16 12:03:18,735 INFO [Timer-Driven Process Thread-9] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] to run with 1 threads
2018-10-16 12:03:18,787 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false
2018-10-16 12:03:19,271 INFO [NiFi Web Server-317] o.a.n.c.s.StandardProcessScheduler Stopping PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6]
2018-10-16 12:03:19,271 INFO [NiFi Web Server-317] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.kudu.PutKudu
2018-10-16 12:03:19,274 INFO [Timer-Driven Process Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutKudu[id=01661009-bea9-1475-b35c-86134aa20ef6] to run
2018-10-16 12:03:19,351 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@2e993868 // Another save pending = false

Thanks

avatar
New Contributor

Hi! Anyone fixed this problem? I'm facing a similiar problem. PutKudu (or any kudu client script) when writing millions os rows, don't write some rows. No log entries and no errors in Nifi.


Nifi: 1.15.1
Kudu: 1.9.1-cdh6.2.1

Thanks in advance.