About bleonhardi

james1 · ‎03-25-2017

@Benjamin Leonhardi , on slide 24 you notate that a small stripe size indicates a memory problem during load. Do you know what memory problem that would be? I have ~ 3500 records on the stripe and was just wondering where I should look. Thanks!

nsabharwal · ‎02-27-2016

@Prakash Punj Did you copy the file locally instead hdfs as I mentioned in my reply?

sandeepksaini · ‎11-17-2017

Nope, reducers don't communicate with each other and neither the mappers do. All of them runs in a separate JVM containers and don't have information of each other. AppMaster is the demon which takes care and manage these JVM based containers (Mapper/Reducer).

bleonhardi · ‎02-11-2016

A poltergeist? I don't have a line 98, neither in Macos nor in linux.

bleonhardi · ‎02-08-2016

Which version of HDP are you running? The hadoop command environment settings are added in the hadoop environment hadoop-env template in ambari in Advanced core-site.xml under HDFS HADOOP_OPTS and HADOOP_CLIENT_OPTS Here I have the sections of HDP 2.3.4. Using Java 1.8 perhaps you can compare them with yours? I see one line that has the settings. But its in the if java version < 8 section of the template. export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -XX:MaxPermSize=512m $HADOOP_CLIENT_OPTS" # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME={{java_home}} export HADOOP_HOME_WARN_SUPPRESS=1 # Hadoop home directory export HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}} # Hadoop Configuration Directory {# this is different for HDP1 #} # Path to jsvc required by secure HDP 2.0 datanode export JSVC_HOME={{jsvc_path}} # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE="{{hadoop_heapsize}}" export HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms{{namenode_heapsize}}" # Extra Java runtime options. Empty by default. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}" # Command specific options appended to HADOOP_OPTS when specified HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}" HADOOP_TASKTRACKER_OPTS="-server -Xmx{{ttnode_heapsize}} -Dhadoop.security.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}" {% if java_version < 8 %} SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -XX:PermSize={{namenode_opt_permsize}} -XX:MaxPermSize={{namenode_opt_maxpermsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}" export HADOOP_SECONDARYNAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node\" ${HADOOP_SECONDARYNAMENODE_OPTS}" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -XX:MaxPermSize=512m $HADOOP_CLIENT_OPTS" {% else %} SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS}" export HADOOP_SECONDARYNAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node\" ${HADOOP_SECONDARYNAMENODE_OPTS}" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m $HADOOP_CLIENT_OPTS" {% endif %} HADOOP_NFS3_OPTS="-Xmx{{nfsgateway_heapsize}}m -Dhadoop.security.logger=ERROR,DRFAS ${HADOOP_NFS3_OPTS}" HADOOP_BALANCER_OPTS="-server -Xmx{{hadoop_heapsize}}m ${HADOOP_BALANCER_OPTS}" # On secure datanodes, user to run the datanode as after dropping privileges export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER:-{{hadoop_secure_dn_user}}} # Extra ssh options. Empty by default. export HADOOP_SSH_OPTS="-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR={{hdfs_log_dir_prefix}}/$USER # History server logs export HADOOP_MAPRED_LOG_DIR={{mapred_log_dir_prefix}}/$USER # Where log files are stored in the secure data environment. export HADOOP_SECURE_DN_LOG_DIR={{hdfs_log_dir_prefix}}/$HADOOP_SECURE_DN_USER # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. export HADOOP_PID_DIR={{hadoop_pid_dir_prefix}}/$USER export HADOOP_SECURE_DN_PID_DIR={{hadoop_pid_dir_prefix}}/$HADOOP_SECURE_DN_USER # History server pid export HADOOP_MAPRED_PID_DIR={{mapred_pid_dir_prefix}}/$USER YARN_RESOURCEMANAGER_OPTS="-Dyarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY" # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10 # Use libraries from standard classpath JAVA_JDBC_LIBS="" #Add libraries required by mysql connector for jarFile in `ls /usr/share/java/*mysql* 2>/dev/null` do JAVA_JDBC_LIBS=${JAVA_JDBC_LIBS}:$jarFile done # Add libraries required by oracle connector for jarFile in `ls /usr/share/java/*ojdbc* 2>/dev/null` do JAVA_JDBC_LIBS=${JAVA_JDBC_LIBS}:$jarFile done export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${JAVA_JDBC_LIBS} # Setting path to hdfs command line export HADOOP_LIBEXEC_DIR={{hadoop_libexec_dir}} # Mostly required for hadoop 2.0 export JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH} export HADOOP_OPTS="-Dhdp.version=$HDP_VERSION $HADOOP_OPTS" {% if is_datanode_max_locked_memory_set %} # Fix temporary bug, when ulimit from conf files is not picked up, without full relogin. # Makes sense to fix only when runing DN as root if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then ulimit -l {{datanode_max_locked_memory}} fi {% endif %}

amoghsuman · ‎01-10-2018

@Rupinder Singh Can you please elaborate the exact solution to this problem ? I am facing the same issue..

judynash · ‎10-05-2016

is there a way to get ambari version via the rest api?

bleonhardi · ‎02-05-2016

I think there is a misunderstanding in what yarn does. It doesn't care at all how much memory is available on the Linux machines. Or about buffers or caches It only cares about the settings in the yarn configuration. You can check them in Ambari.It is your responsibility to set them correctly so they fit to the system. You can find on the yarn page of ambari: - The total amount of RAM available to yarn on any one datanode. This is estimated by ambari during the installation but in the end your responsibility. - The min size of a container. ( this is also the common divider of container sizes ) - the max size of a container ( normally yarn max is a good idea ) So lets assume you have a 3 node cluster with 32GB of RAM on each and yarn memory has been set to 24GB ( leaving 8 to OS plus HDFS ) Lets also assume your min container size is 1GB. This gives you 24GB * 3 = 72GB in total for yarn and at most 72 containers. A couple important things: - If you set your map settings to 1.5GB you have at most 36 containers since yarn only gives out slots in multiples of the minimum ( i.e. 2GB, 3GB 4GB, ... ) This is a common problem. So always set your container sizes as multiple of the min. -If you have only 16GB on the nodes and you set the yarn memory to 32GB, yarn will happily bring your system into outofmemory. It is your responsibility to configure it correctly so it uses the available RAM but not more What yarn does is to shoot down any task that uses more than its requested amount of RAM and to schedule tasks so they are running locally to data etc. pp.

nsabharwal · ‎02-04-2016

@Zack Riesland See this

lbronshtein · ‎02-15-2016

I have seen so.ething similar caused by YARN-2964

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: Creating Indexes in Hive

Re: Getting –useHCatalog file doesnot exist when r...

Re: YARN v/s MapReduce?

Re: Flume Tutorial Error with Python Script

Re: Post Java 1.8 upgrade: Stderr writes warning w...

Re: pig script status running but always remain at...

Re: How to know the version of Ambari Server and A...

Re: Resource Management in Yarn - Container pendin...

Re: Run multiple queries on Hive / Phoenix?

Re: Yarn preemption and Oozie