Member since
04-23-2018
16
Posts
2
Kudos Received
0
Solutions
06-04-2018
03:30 PM
I agree with you but one problem is I am ingesting and querying millions of files so I doubt MonitorActivity will be able to keep track and also I don't find "run-schedule" option in properties of any ES/HBase processor to be accurate. E.g. Changing the run-schedule for HBase doesn't actually slow down the ingest much at all probably due to batch sizes.
... View more
05-31-2018
05:52 AM
Right but my goal was to be able to write binary cells and other string cells using PutHBaseJson. My issue is instead of a GetFile, I'm getting it as a flowfile with a binary field "bytes" field. I'm having a bit of an issue trying to parse it so that I can maybe put it into flowfile-content while still keeping the other fields as attributes perhaps. So you send the attributes to PutHBaseJson, while you send the flowfile-content PutHBaseCell. For that I think I need to split all the avros into individual flowfiles, then the tricky part is converting the binary field into flowfile-content. How exactly would you convert one field in an avro into flowfile content ?
... View more
05-07-2018
11:09 AM
@B
X
For reducing number of fields and renaming the fields we won't need to use Convert Record processor also, we can acheive by using one UpdateRecord processor, as update record processor expects to add atleast one user-defined properties(like swapping field name...) once we add the one property then we can do reduce or rename the fields. Please see this article as i'm reducing and renaming fields in first update record processor. if you are thinking to just reduce number of fields and not changing any contents then we need to use ConvertRecord processor.
... View more
04-28-2018
06:55 PM
Thank you for this. I'm going to give this a shot. Everything sounds exactly like it would work great so I'll build it out next week.
... View more
01-03-2019
01:25 PM
1 Kudo
Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are
reported and there's no more missing blocks.
... View more