About yves_name

yves_name · ‎01-23-2018

In case useful for others . The hdfs get at some stage corrupted. i made an fsck -delete, but ended up in a instable situation . All the given directory get totally full on all the node . This is related to the block scanner, which is a facility to scan all block and do necessary verification . This only occur every 3 weeks by default due to the intensity of disk scan and IO. So to claim back those blockpool you have to trigger the Block Scanner, which is not possible through command line . One option can be set dfs.datanode.scan.period.hours to 1 . You may also consider to delete the scanner.cursor files rm -rf `locate scanner.cursor` then restart the datanode . http://hadoopinrealworld.com/datanode-block-scanner/ https://community.hortonworks.com/questions/6931/in-hdfs-why-corrupted-blocks-happens.html https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/

aanghel · ‎09-21-2016

HDFS has an inotify feature which essentially translates those log entries into events that can be consumed. https://issues.apache.org/jira/browse/HDFS-6634 Here's a Java based example: https://github.com/onefoursix/hdfs-inotify-example Alternatively, rather than having Oozie monitor many directories and waste resources, a script can execute 'hdfs dfs -ls -R /folder|grep|sed' every minute or so but that's still not event based, so it depends how fast of a reaction you need vs how easy you can implement/use the inotify API.

drussell · ‎06-01-2016

Hi @x name. I don't believe there is anything pre-built within flume to do exactly what you need. Flume itself is certainly production ready and has been in constant use by a very wide range of people for a long time now, it just hasn't evolved past that point very much. It also starts to struggle with significant load under the kind of scenario's you're discussing unless it's very carefully managed. You've already identified the tool set that I'd probably recommend for your requirement which is NiFi. You've also identified another article so I won't go into that any further. As for other tools or patterns, I've seen people build some of their own ingest frameworks using a combination of scripts and things like webhdfs, or indeed a lot of custom code on top of Kafka. However with the way that the technology is stacking up now, unless you have a strong reason not to, NiFi solves all the issues you bring up and is easy to use as well, I'd strongly recommend it. If you do find something else please do add a comment here, likewise if you try NiFi and you get stuck at all, don't hesitate to fire over another question! Hope that helps.

dakmatt · ‎05-17-2017

i got the similar issues above with HDP2.6 when i run yum -y erase hdp-select on each host, still there exist problem where Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf User Group mapping (user_group) is missing in the hostLevelParams Kindly advise.

Online	Offline
Last Visited	‎02-13-2018 09:28 AM

Member Since	‎06-01-2016 05:36 AM
Last Visited	‎02-13-2018 09:28 AM
Posts	5

Cloudera Community

Re: Local File system full, due to hadoop data dir...

Re: HDFS Best way to trigger execution at File arr...

Re: Best tools for file transfer and ingest.

Re: /usr/hdp/current/hadoop-client/conf doesn't ex...