Member since
01-09-2018
33
Posts
3
Kudos Received
0
Solutions
12-09-2020
07:30 PM
Hi @nikolayburiak , were you able to fix it? I am also facing the same issue where the HBase_2_ClientService cannot renew the Kerberos ticket on its on. I have tried defining the keytab and principal directly in the service but to no success. Interestingly, even if I renew(kinit) or destroy(kdestroy) the Kerberos ticket from the command line on the node running Nifi/Hadoop client, it doesn't have any effect on the client service or the Puthbase processor. So, I am not sure how Nifi creates a kerbros ticket? The only workaround is restarting the HBase_2_ClientService . I am using Nifi 1.12 with HDP 3.14
... View more
09-26-2018
04:29 AM
Does any one know when Streams Messaging Manager will be available for download? I would like to use it on our HDF 3.2 cluster.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Cloudera DataFlow (CDF)
07-11-2018
02:13 AM
1 Kudo
Hi @Shu, I was able to implement your idea of using MergerRecord -> PutHbaseJsonRecord using the Record reader controller services. However , I think there is a limitation in PutHbaseJsonRecord. We are syncing Oracle tables in Hbase using Goldengate and there are table with multiple PK's in oracle .We are creating a corresponding row key in Hbase by concatenating those PK's together. The PutHbaseJson allows to do that so we concatenate the PK's and pass it as an attribute to the processor. But the PutHbaseJsonRecord corresponding property is "Row Identifier Field Name" , so it is expecting the row key to be an element in the json that is read by the record reader. I've tried passing the same attribute i was sending to the PutHbaseJson but it doesn't work. Do you agree to this? I can think of a work around here where I transform the JSON to add the attribute(concatenated PK's) to it,which at this point i don't know how to do it . But even if i manage to do it then I will also need to change the schema as well. Kindly let me know if there is a better way to skin this cat.
... View more
06-26-2018
02:32 AM
1 Kudo
I think i miss understood the prupose of the record reader. It looks clear to me now. Thank you for the suggestion. I'll work on this idea.
... View more
06-26-2018
01:39 AM
Hi Shu, Thank you for your reply. I'll have to study about this Record reader stuff in detail because by the time our flow file reaches the puthbaseJson proc it already contains the reduced json in its payload and simply needs to be put in the target hbase table. So I don't require any sort of manipulation or parsing to be done on it which apparently Record readers does. And i see that the record reader is a required field so there is no way around it. Is there a way to create a dummy reader that does nothing ? : P . I'll explore this on my own as well.
... View more
06-25-2018
06:49 AM
1 Kudo
The slowest part of our data flow is the PuthbaseJson processor and I am trying to find out a way to optimize it. There is a configuration inside the proc where you can increase the batch size of the flow files it can process in a single execution. It set to 25 by default and i have tried increasing it up to 1000 with little performance gain. Increasing the concurrent tasks also hasn't helped in speeding up the put commands that processor runs. Has any one else worked with this processor and optimized it? The batch configuration of the processor says that it does the put by first grouping the flowfiles by table. Is there any thing i can do here? the name of the table already comes with the flowfile as an attribute and it is extracted from the attribute using the expression language. I am not sure how do i 'group it' before it reaches the Puthbasejosn. Kindly let me know of any ideas.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi
06-20-2018
07:11 AM
How would a consumer with multiple topics work? Let say for example in a four node cluster we have a consumer (consume_kafka proc with 1 concurrent task) consuming from 10 topics (separated with comma) . Will it assign the 4 concurrent task to these 10 topics on a round-robin manner? I found that creating individual processor as opposed to combining all the topics in one processor to be more efficient. But this where we are stuck , we need to consume from 250 sources so please let me know what would be an efficient approach here. Creating 250 processors is possible but then due to limited number of threads available , some or alot processors does get the thread they need and it ends up with an error.
... View more
06-11-2018
01:38 AM
@mrodriguez Thanks for your reply. Please find below the output. []$ bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic TEST_KAFKA_TOPIC Topic:TEST_KAFKA_TOPIC PartitionCount:1 ReplicationFactor:1 Configs: Topic: TEST_KAFKA_TOPIC Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001 Yes,I'm able to consume from the topic from the console . The error comes and go on its own.
... View more
06-08-2018
07:41 AM
Does anyone know about this error from Kafka? I am using Nifi 1.5.0 with ConsumerKafka processor. ConsumeKafka[id=34753ed3-9dd6-15ed-9c91-147026236eee] Failed to retain connection due to No current assignment for partition TEST_KAFKA_TOPIC: This is the first time we are testing Nifi to consume from over 200 topics and its failing terribly so far. When this error goes the other one comes up which is as below Was interrupted while trying to communicate with Kafka with lease org.apache.nifi.processors.kafka.pubsub.ConsumerPool$SimpleConsumerLease@6cb8afba. Will roll back session and discard any partially received data.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
06-07-2018
04:45 AM
Is there a recommended way to ensure the row counts form tables in source (Oracle) are consistent with that of target tables in Hbase ( data-lake)? .We are using Nifi which receives the golden gate messages and then by using different processor we store the transactions in Hbase ,so essentially the tables in Hbase should be in sync with the tables in Oracle at all times. I am interested in knowing how the teams ensure and proof this ? Do they take row counts from source and target everyday and match it and say that its synced ? I used the counter option in Nifi which maintained the record received against each table but i guess that is not an optimized way to do it.
... View more
Labels:
- Labels:
-
Apache NiFi
04-05-2018
06:03 AM
Hi, Is there a way to consume a kafka message along with its timestamp in Nifi using the Consumekafka processor? e.g. we can consume the kafka message and see the time stamp by adding the property print.timestamp=true . The output looks something like this on the console CreateTime:1522893745217 test_message I can't seem to access this variable 'CreateTime'. I've tried using ${kafka.CreateTime} in the Updateattribute process but it doesnt work. Please let me know if there is way to do this as adding the custom timestamp (now()) is not an option in our case.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
03-13-2018
06:12 AM
@Matt Clarke Thanks for your answer. If you can kindly aslo help with the below ones as well 1. While assigning salves and clients during HDF installation is 1 node enough for installing the Nifi Certificate authority ( considering we will have Nifi master service on 4 nodes) 2. How many zoo keeper client/ Infra Solr client are enough for a 3 node zookeeper cluster ( i.e zoo keeper master service on 3 nodes)? and is it okay to have client service running on the same node which is also hosting the zookeeper master service or should it be on a different server? Thank you.
... View more
03-12-2018
06:46 AM
Hi All. I am trying to install the HDF cluster on a 9 node cluster. Previously I have only worked on a single node Nifi standalone instance so its a kind big jump for me. So couple of question here 1. There is no concept of Slaves when talking about Nifi right? When 'Assigning masters' I need to add all nodes on which i need to install the Nifi service and that is how i will get a truly "9 node Nifi cluster"? By default the Ambari interface is recommending to install nifi only on 3 nodes. 2. By default Ambari is installing zookeeper on 3 nodes ( strangely these are the same nodes on which its installing Nifi) Why is it not using/recommending the rest of the nodes on which Ambari host is already installed. Do i have to install zookeeper client on every node i install nifi on? 3. In Nifi cluster is there single coordinator node where keeps the cluster together? e.g. My data flows puts all the flow files it recieves from golden gate into a folder at OS level. Do i have to create folder on every node to keep the cluster at sync?
... View more
Labels:
02-20-2018
02:02 AM
I did not make any changes to the consumer process or the group id. The offset reset option is set to latest. There were also no changes made to the topic on the consumer side at least. The topic which in this case is a name of an Oracle table is generated in kafka when the table is created in Oracle. I am pretty sure there were no changes made in the configuration on goldengate as well. As the team that manages those server have made changes to Goldengate configuration in the past with no disruption to workflows in Nifi. The server on which Goldengate is installed is different from the one that hosts Nifi , Can a network disruption cause this error?
... View more
02-16-2018
07:07 AM
Hi All, I got an error while using the ConsumeKafka_0_10 processor . Below is the complete error from nifi-app.log 2018-02-16 11:47:11,465 WARN [Timer-Driven Process Thread-2] o.a.n.p.kafka.pubsub.ConsumeKafka_0_10 ConsumeKafka_0_10[id=b846165f-115a-1161-cadd-815bbb3a45dd] Was interrupted while trying to communicate
with Kafka with lease org.apache.nifi.processors.kafka.pubsub.ConsumerPool$SimpleConsumerLease@6cb8afba. Will roll back session and discard any partially received data. bootstrap.servers = [xxx-xxxkafka-xxxz.xxx.xx.local:9093] key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor] value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer The last time I saw this error I restarted the nifi and it was gone but it did not work this time. There hasn't been any change on the kafka side. I'm using nifi 1.4.0 along with kafka_2.11-0.11.0.1 to consume kafka records , please let me know if any one knows the root cause of this error.
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
02-15-2018
07:41 AM
Hi @spdvnz @Raj B , Can you let me know how were you able to grab the failures eventually? I've tried storing the bulletin board messages in hdfs using the REST API but the Json generated from there is very detailed and would require a lot of work before I can use it for monitoring purpose. Would like to know how you guys did it? Here is the link to the actual question https://community.hortonworks.com/questions/171150/monitoring-nifi-flow-file-failures-success-1.html
... View more
02-15-2018
06:22 AM
Is there any way to increase this window from 5 minutes to lets say an hour or more? ( A common question would be how many records were processed during a 24 hour window etc.)
... View more
02-14-2018
07:10 AM
Thanks, can you kindly let me know how can I
change the retention period of these repositories? (from the nifi
properties file I can see these two properties whose unit are the length of
time. nifi.flow.configuration.archive.max.time=30
days nifi.content.repository.archive.max.retention.period=12
hours )
... View more
02-13-2018
07:59 AM
I am working on building a monitoring solution for my Nifi workflow (Real time data lake using Golden-gate/ Kafka). Currently I am storing all the Goldgen gate records/flowfiles received from kafka in to a hdfs directory and at the end of the workflow ,in case its ingested successfully in hbase the flowfile is deleted from the directory. So i know that the json format flowfiles left in the Hdfs directory are ones that have failed. Now the issue with finding the reason of the failure is that Nifi's bulletin board shows only the record for last 5/10 mins (not sure of the duration) . I've tried storing the bulletin board messages in hdfs using the REST API but the Json generated from there is very detailed and would require a lot of work before I can use it for monitoring purpose. Has anyone else worked with such type of monitoring? I would also like to know the throughput of the workflow which would include the no of records failed or successfully ingested etc. I know i can get the last-5-min stat from the status history. But if any one else has worked on a similar monitoring task kindly let me know.
... View more
Labels:
- Labels:
-
Apache NiFi
01-16-2018
08:00 AM
Hi, The Jira has been marked as "Patch available". Can you kindly let me know how can we 'install' or upgrade our instance of Nifi so that we can access this processor?
... View more
01-16-2018
07:59 AM
Can you kindly let me know how can i 'merge' this pull request to our current instance so that we can access the processor?
... View more
01-16-2018
07:58 AM
Can you kindly let me know how can i 'merge' this pull request to our current instance so that we can access the processor?
... View more