About trainings

trainings · ‎04-03-2018

Hopefully, you would have solved it. But, for others: Configuration must be: hive.llap.daemon >= llap_heap_size + llap.io.memory.size + llap_headroom_space Which is not the case above.

trainings · ‎04-01-2017

So, it means that the zookeeper ensemble is not up. How many nodes you have in the zookeeper ? make sure the server.ip mapping and myid matches. paste your "zoo.cfg" here and netstat -tlpn | grep 2181

trainings · ‎03-31-2017

FIFO was the default scheduler in Hadoop1, when you deploy Apache Hadoop vanilla version. It is not used in production, as only 1 job can run. But 1 job can have many containers and each node in the cluster can have few of them running. Size of the container has nothing to do with how many jobs are running on a cluster. It is decided by the map/reduce memory which asks for a container from YARN, which should be a multiple of yarn minimum allocation. There are lot of details here, but keeping it simple the size of container is not related to the number of containers. Yes, for mathematical calculations and to find how many containers can run on a node, we say that the number is equal to total memory avail for yarn/container memory. Size of a container is decided based on the request, whether it is map or reduce or spark task container etc.

trainings · ‎03-31-2017

Firstly, verify that the Zookeeper ensemble is up. Zookeeper daemon being up and running does not mean there is a "ensemble". Can you connect to zookeeper ? zkCli.sh -server localhost:2181 (Change to the address where it runs) [zk: localhost:2181(CONNECTED) 0] ls / Will list all znodes, can you see "rmstore" there ? you can delete it by rmr /rmstore Restart zookeeper and RM

trainings · ‎02-01-2017

In addition to the above, we can have HBase on S3 instead of HDFS, but for that we must use emrfs implementation. Keeping it simple, use EMR 5.2 and greater versions. But, still Namenode is mandatory.

trainings · ‎10-11-2016

@SaurabhSaurabh Yes, the script I gave was with "hadoop fs -ls" command, because many people do not understand what it does and they will simply copy the script, run it and then blame that they lost data. The problem is most people, call themselves Hadoop admins, but have never worked as Linux system admins/engineer 🙂

trainings · ‎09-22-2016

@Saurabh the script takes a argument as the number of days 🙂 So, if you want to look for files older then 10 days then #./cleaup.sh 10

trainings · ‎08-30-2016

You can do: #!/bin/bash usage="Usage: dir_diff.sh [days]" if [ ! "$1" ] then echo $usage exit 1 fi now=$(date +%s) hadoop fs -ls -R /tmp/ | grep "^d" | while read f; do dir_date=`echo $f | awk '{print $6}'` difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) )) if [ $difference -gt $1 ]; then hadoop fs -rm -r `echo $f | awk '{ print $8 }'`; fi done Replace the directories or files you need to clean up appropriately.

Online	Offline
Last Visited	‎04-03-2018 10:49 AM

Member Since	‎08-30-2016 11:23 PM
Last Visited	‎04-03-2018 10:49 AM
Posts	11
Kudos received	3

Cloudera Community

Re: if the scheduler is FIFO how the size of conta...

Re: Do we have any script which we can use to clea...

Re: Memory required for insert-select operation in...

Re: Resource Manager down - /rmstore error

Re: if the scheduler is FIFO how the size of conta...

Re: Resource Manager down - /rmstore error

Re: HBase Cluster Setup

Re: Do we have any script which we can use to clea...

Re: Do we have any script which we can use to clea...

Re: Do we have any script which we can use to clea...