About te04_0172

te04_0172 · ‎12-09-2020

Hi @nikolayburiak , were you able to fix it? I am also facing the same issue where the HBase_2_ClientService cannot renew the Kerberos ticket on its on. I have tried defining the keytab and principal directly in the service but to no success. Interestingly, even if I renew(kinit) or destroy(kdestroy) the Kerberos ticket from the command line on the node running Nifi/Hadoop client, it doesn't have any effect on the client service or the Puthbase processor. So, I am not sure how Nifi creates a kerbros ticket? The only workaround is restarting the HBase_2_ClientService . I am using Nifi 1.12 with HDP 3.14

te04_0172 · ‎09-26-2018

Does any one know when Streams Messaging Manager will be available for download? I would like to use it on our HDF 3.2 cluster.

te04_0172 · ‎07-11-2018

Hi @Shu, I was able to implement your idea of using MergerRecord -> PutHbaseJsonRecord using the Record reader controller services. However , I think there is a limitation in PutHbaseJsonRecord. We are syncing Oracle tables in Hbase using Goldengate and there are table with multiple PK's in oracle .We are creating a corresponding row key in Hbase by concatenating those PK's together. The PutHbaseJson allows to do that so we concatenate the PK's and pass it as an attribute to the processor. But the PutHbaseJsonRecord corresponding property is "Row Identifier Field Name" , so it is expecting the row key to be an element in the json that is read by the record reader. I've tried passing the same attribute i was sending to the PutHbaseJson but it doesn't work. Do you agree to this? I can think of a work around here where I transform the JSON to add the attribute(concatenated PK's) to it,which at this point i don't know how to do it . But even if i manage to do it then I will also need to change the schema as well. Kindly let me know if there is a better way to skin this cat.

te04_0172 · ‎06-26-2018

I think i miss understood the prupose of the record reader. It looks clear to me now. Thank you for the suggestion. I'll work on this idea.

te04_0172 · ‎06-26-2018

Hi Shu, Thank you for your reply. I'll have to study about this Record reader stuff in detail because by the time our flow file reaches the puthbaseJson proc it already contains the reduced json in its payload and simply needs to be put in the target hbase table. So I don't require any sort of manipulation or parsing to be done on it which apparently Record readers does. And i see that the record reader is a required field so there is no way around it. Is there a way to create a dummy reader that does nothing ? : P . I'll explore this on my own as well.

te04_0172 · ‎06-25-2018

The slowest part of our data flow is the PuthbaseJson processor and I am trying to find out a way to optimize it. There is a configuration inside the proc where you can increase the batch size of the flow files it can process in a single execution. It set to 25 by default and i have tried increasing it up to 1000 with little performance gain. Increasing the concurrent tasks also hasn't helped in speeding up the put commands that processor runs. Has any one else worked with this processor and optimized it? The batch configuration of the processor says that it does the put by first grouping the flowfiles by table. Is there any thing i can do here? the name of the table already comes with the flowfile as an attribute and it is extracted from the attribute using the expression language. I am not sure how do i 'group it' before it reaches the Puthbasejosn. Kindly let me know of any ideas.

te04_0172 · ‎06-20-2018

How would a consumer with multiple topics work? Let say for example in a four node cluster we have a consumer (consume_kafka proc with 1 concurrent task) consuming from 10 topics (separated with comma) . Will it assign the 4 concurrent task to these 10 topics on a round-robin manner? I found that creating individual processor as opposed to combining all the topics in one processor to be more efficient. But this where we are stuck , we need to consume from 250 sources so please let me know what would be an efficient approach here. Creating 250 processors is possible but then due to limited number of threads available , some or alot processors does get the thread they need and it ends up with an error.

te04_0172 · ‎04-05-2018

Hi, Is there a way to consume a kafka message along with its timestamp in Nifi using the Consumekafka processor? e.g. we can consume the kafka message and see the time stamp by adding the property print.timestamp=true . The output looks something like this on the console CreateTime:1522893745217 test_message I can't seem to access this variable 'CreateTime'. I've tried using ${kafka.CreateTime} in the Updateattribute process but it doesnt work. Please let me know if there is way to do this as adding the custom timestamp (now()) is not an option in our case.

te04_0172 · ‎03-13-2018

@Matt Clarke Thanks for your answer. If you can kindly aslo help with the below ones as well 1. While assigning salves and clients during HDF installation is 1 node enough for installing the Nifi Certificate authority ( considering we will have Nifi master service on 4 nodes) 2. How many zoo keeper client/ Infra Solr client are enough for a 3 node zookeeper cluster ( i.e zoo keeper master service on 3 nodes)? and is it okay to have client service running on the same node which is also hosting the zookeeper master service or should it be on a different server? Thank you.

te04_0172 · ‎03-12-2018

Hi All. I am trying to install the HDF cluster on a 9 node cluster. Previously I have only worked on a single node Nifi standalone instance so its a kind big jump for me. So couple of question here 1. There is no concept of Slaves when talking about Nifi right? When 'Assigning masters' I need to add all nodes on which i need to install the Nifi service and that is how i will get a truly "9 node Nifi cluster"? By default the Ambari interface is recommending to install nifi only on 3 nodes. 2. By default Ambari is installing zookeeper on 3 nodes ( strangely these are the same nodes on which its installing Nifi) Why is it not using/recommending the rest of the nodes on which Ambari host is already installed. Do i have to install zookeeper client on every node i install nifi on? 3. In Nifi cluster is there single coordinator node where keeps the cluster together? e.g. My data flows puts all the flow files it recieves from golden gate into a folder at OS level. Do i have to create folder on every node to keep the cluster at sync?

Online	Offline
Last Visited	‎12-10-2020 03:24 AM

Member Since	‎01-09-2018 01:47 AM
Last Visited	‎12-10-2020 03:24 AM
Posts	33
Kudos received	3

Cloudera Community

Re: Issue with auto-renewal of kerberos ticket (Ni...

Streams Messaging Manager

Re: Puthbasejson performance optimization

Re: Puthbasejson performance optimization

Re: Puthbasejson performance optimization

Puthbasejson performance optimization

Re: Integrating Apache NiFi and Apache Kafka

Read Kafka timestamp as an attribute in Nifi

Re: HDF Ambari Nifi

HDF Ambari Nifi