Member since
06-01-2016
5
Posts
0
Kudos Received
0
Solutions
01-23-2018
09:03 AM
In case useful for others .
The hdfs get at some stage corrupted. i made an fsck -delete, but ended up in a instable situation .
All the given directory get totally full on all the node .
This is related to the block scanner, which is a facility to scan all block and do necessary verification .
This only occur every 3 weeks by default due to the intensity of disk scan and IO. So to claim back those blockpool you have to trigger the Block Scanner, which is not possible through command line .
One option can be set
dfs.datanode.scan.period.hours to 1 . You may also consider to delete the scanner.cursor files rm -rf `locate scanner.cursor` then restart the datanode .
http://hadoopinrealworld.com/datanode-block-scanner/
https://community.hortonworks.com/questions/6931/in-hdfs-why-corrupted-blocks-happens.html https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/
... View more
09-21-2016
01:34 PM
HDFS has an inotify feature which essentially translates those log entries into events that can be consumed. https://issues.apache.org/jira/browse/HDFS-6634 Here's a Java based example: https://github.com/onefoursix/hdfs-inotify-example Alternatively, rather than having Oozie monitor many directories and waste resources, a script can execute 'hdfs dfs -ls -R /folder|grep|sed' every minute or so but that's still not event based, so it depends how fast of a reaction you need vs how easy you can implement/use the inotify API.
... View more
06-01-2016
08:54 AM
2 Kudos
Hi @x name. I don't believe there is anything pre-built within flume to do exactly what you need. Flume itself is certainly production ready and has been in constant use by a very wide range of people for a long time now, it just hasn't evolved past that point very much. It also starts to struggle with significant load under the kind of scenario's you're discussing unless it's very carefully managed. You've already identified the tool set that I'd probably recommend for your requirement which is NiFi. You've also identified another article so I won't go into that any further. As for other tools or patterns, I've seen people build some of their own ingest frameworks using a combination of scripts and things like webhdfs, or indeed a lot of custom code on top of Kafka. However with the way that the technology is stacking up now, unless you have a strong reason not to, NiFi solves all the issues you bring up and is easy to use as well, I'd strongly recommend it. If you do find something else please do add a comment here, likewise if you try NiFi and you get stuck at all, don't hesitate to fire over another question! Hope that helps.
... View more
05-17-2017
03:51 AM
i got the similar issues above with HDP2.6 when i run yum -y erase hdp-select on each host, still there exist problem where Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
User Group mapping (user_group) is missing in the hostLevelParams Kindly advise.
... View more