About dpexecute

dpexecute · ‎06-04-2018

I agree with you but one problem is I am ingesting and querying millions of files so I doubt MonitorActivity will be able to keep track and also I don't find "run-schedule" option in properties of any ES/HBase processor to be accurate. E.g. Changing the run-schedule for HBase doesn't actually slow down the ingest much at all probably due to batch sizes.

dpexecute · ‎05-31-2018

Right but my goal was to be able to write binary cells and other string cells using PutHBaseJson. My issue is instead of a GetFile, I'm getting it as a flowfile with a binary field "bytes" field. I'm having a bit of an issue trying to parse it so that I can maybe put it into flowfile-content while still keeping the other fields as attributes perhaps. So you send the attributes to PutHBaseJson, while you send the flowfile-content PutHBaseCell. For that I think I need to split all the avros into individual flowfiles, then the tricky part is converting the binary field into flowfile-content. How exactly would you convert one field in an avro into flowfile content ?

Shu_ashu · ‎05-07-2018

@B X For reducing number of fields and renaming the fields we won't need to use Convert Record processor also, we can acheive by using one UpdateRecord processor, as update record processor expects to add atleast one user-defined properties(like swapping field name...) once we add the one property then we can do reduce or rename the fields. Please see this article as i'm reducing and renaming fields in first update record processor. if you are thinking to just reduce number of fields and not changing any contents then we need to use ConvertRecord processor.

dpexecute · ‎04-28-2018

Thank you for this. I'm going to give this a shot. Everything sounds exactly like it would work great so I'll build it out next week.

LH · ‎01-03-2019

Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are reported and there's no more missing blocks.

Online	Offline
Last Visited	‎06-12-2018 02:26 PM

Member Since	‎04-23-2018 08:37 PM
Last Visited	‎06-12-2018 02:26 PM
Posts	16
Kudos received	2

Cloudera Community

Re: Problem with timing Elastic Queries with Nifi

Re: Best way to store PDF from Avro Binary in Nifi

Re: Reduce fields in AvroReader to AvroWriter usin...

Re: Nifi Avro Binary Bytes reading

Re: Best way of handling corrupt or missing blocks...