Created 06-05-2019 12:17 PM
We are new here and sing cloudbreak and hadoop as our data platform. We need to register a prometheus JMX exporter by adding following lines to hadoop-env.sh
export HDFS_NAMENODE_OPTS="${HDFS_NAMENODE_OPTS} -javaagent:/home/hduser/jmx_prometheus_javaagent-0.11.0.jar=19850:/home/hduser_/namenode.yml"
export HDFS_DATANODE_OPTS="${HDFS_DATANODE_OPTS} -javaagent:/home/hduser/jmx_prometheus_javaagent-0.11.0.jar=19851:/home/hduser_/datanode.yml"
When we ssh into nodes and change "/etc/hadoop/2.6.5.0-292/0/hadoop-env.sh", the changes are reverted on restart.
So we figured out the way and changes advanced hadoop-env from amabari management UI and added above lines. On restart, if we do netstat, we dont see any process listening on 19850 or 19851.
How can we configure the Prometheus JMX exporter?
Created 06-05-2019 12:28 PM
In an ambari managed cluster any changes made manually inside the scripts like "hadoop-env.sh" will be reverted back as soon as we restart those components from Ambari UI Because ambari will push the configs which are stored inside the ambari Db for those script templates to that host.
Hence for making such changes you must use the Ambari UI / APIs , Like "Advanced hadoop-env" from ambari.
After you made those changes from Ambari UI do you actually see the mentioned Java arguments when you try to run the following commands?
# ps -ef | grep -i NameNode # ps -ef | grep -i DataNode
A. If you do not see those "javaagent" options in the above commands output then it means that those changes were not applied properly. In that case please share the full "Advanced hadoop-env" template from ambari UI so that we can check if it was applied properly or not?
B. If you are able to see the those "javaagent" options in the above process list command output then there may be something wrong in the either the file permissions "/home/hduser/jmx_prometheus_javaagent-0.11.0.jar" and "/home/hduser_/datanode.yml" Or the YAML file content might not be appropriate. As DataNode and NameNode processes run as "hdfs" user normally so please chekc the file permission and ownership.
Can you please share the /home/hduser_/datanode.yml file content as well ?
Created 06-05-2019 03:00 PM
I not sure, how to add my reply here, and I posted it in Answer. Jay, please let me know if you are not able to see that.
Created 06-05-2019 01:58 PM
The above question and the replies below were originally posted in the Community Help Track. On Wed Jun 5 13:57 UTC 2019, a member of the HCC moderation staff moved it to the Cloud & Operations track. The Community Help Track is intended for questions about using the HCC site itself.
Created 06-05-2019 04:36 PM
Thanks Jay for the prompt answer,
I made changes to Advanced hadoop-env and here is my full template
# Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME={{java_home}} export HADOOP_HOME_WARN_SUPPRESS=1 # Hadoop home directory export HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}} # Hadoop Configuration Directory {# this is different for HDP1 #} # Path to jsvc required by secure HDP 2.0 datanode export JSVC_HOME={{jsvc_path}} # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE="{{hadoop_heapsize}}" export HADOOP_NAMENODE_INIT_HEAPSIZE="-Xms{{namenode_heapsize}}" # Extra Java runtime options. Empty by default. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}" USER="$(whoami)" # Command specific options appended to HADOOP_OPTS when specified HADOOP_JOBTRACKER_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{jtnode_opt_newsize}} -XX:MaxNewSize={{jtnode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx{{jtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dhadoop.mapreduce.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}" HADOOP_TASKTRACKER_OPTS="-server -Xmx{{ttnode_heapsize}} -Dhadoop.security.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}" {% if java_version < 8 %} SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -XX:PermSize={{namenode_opt_permsize}} -XX:MaxPermSize={{namenode_opt_maxpermsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly" export HADOOP_SECONDARYNAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node\" ${HADOOP_SECONDARYNAMENODE_OPTS}" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -XX:MaxPermSize=512m $HADOOP_CLIENT_OPTS" {% else %} SHARED_HADOOP_NAMENODE_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile={{hdfs_log_dir_prefix}}/$USER/hs_err_pid%p.log -XX:NewSize={{namenode_opt_newsize}} -XX:MaxNewSize={{namenode_opt_maxnewsize}} -Xloggc:{{hdfs_log_dir_prefix}}/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms{{namenode_heapsize}} -Xmx{{namenode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT" export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS}" export HADOOP_DATANODE_OPTS="-server -XX:ParallelGCThreads=4 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms{{dtnode_heapsize}} -Xmx{{dtnode_heapsize}} -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_DATANODE_OPTS} -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly" export HADOOP_SECONDARYNAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node\" ${HADOOP_SECONDARYNAMENODE_OPTS}" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m $HADOOP_CLIENT_OPTS" {% endif %} HADOOP_NFS3_OPTS="-Xmx{{nfsgateway_heapsize}}m -Dhadoop.security.logger=ERROR,DRFAS ${HADOOP_NFS3_OPTS}" HADOOP_BALANCER_OPTS="-server -Xmx{{hadoop_heapsize}}m ${HADOOP_BALANCER_OPTS}" # On secure datanodes, user to run the datanode as after dropping privileges export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER:-{{hadoop_secure_dn_user}}} # Extra ssh options. Empty by default. export HADOOP_SSH_OPTS="-o ConnectTimeout=5 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR={{hdfs_log_dir_prefix}}/$USER # History server logs export HADOOP_MAPRED_LOG_DIR={{mapred_log_dir_prefix}}/$USER # Where log files are stored in the secure data environment. export HADOOP_SECURE_DN_LOG_DIR={{hdfs_log_dir_prefix}}/$HADOOP_SECURE_DN_USER # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. export HADOOP_PID_DIR={{hadoop_pid_dir_prefix}}/$USER export HADOOP_SECURE_DN_PID_DIR={{hadoop_pid_dir_prefix}}/$HADOOP_SECURE_DN_USER # History server pid export HADOOP_MAPRED_PID_DIR={{mapred_pid_dir_prefix}}/$USER YARN_RESOURCEMANAGER_OPTS="-Dyarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY" # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10 # Add database libraries JAVA_JDBC_LIBS="" if [ -d "/usr/share/java" ]; then for jarFile in `ls /usr/share/java | grep -E "(mysql|ojdbc|postgresql|sqljdbc)" 2>/dev/null` do JAVA_JDBC_LIBS=${JAVA_JDBC_LIBS}:$jarFile done fi # Add libraries to the hadoop classpath - some may not need a colon as they already include it export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:/usr/lib/hadoop/lib/* # Setting path to hdfs command line export HADOOP_LIBEXEC_DIR={{hadoop_libexec_dir}} # Mostly required for hadoop 2.0 export JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:{{hadoop_lib_home}}/native/Linux-{{architecture}}-64 export HADOOP_OPTS="-Dhdp.version=$HDP_VERSION $HADOOP_OPTS" # Fix temporary bug, when ulimit from conf files is not picked up, without full relogin. # Makes sense to fix only when runing DN as root if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then {% if is_datanode_max_locked_memory_set %} ulimit -l {{datanode_max_locked_memory}} {% endif %} ulimit -n {{hdfs_user_nofile_limit}} fi {% if hadoop_custom_extensions_enabled %} #Enable custom extensions export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:{{stack_root}}/current/ext/hadoop/*:/usr/lib/hadoop/lib/* {% endif %} # Enable ACLs on zookeper znodes if required {% if hadoop_zkfc_opts is defined %} export HADOOP_ZKFC_OPTS="{{hadoop_zkfc_opts}} $HADOOP_ZKFC_OPTS" {% endif %} #This is Rahul export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} -javaagent:/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar=19850:/home/sshhdfsuser/namenode.yml
These changes are getting updates into, /etc/hadoop/2.6.5.0-292/0/hadoop-env.sh
After I run " ps -ef | grep -i NameNode"
Following is the output,
hdfs 11560 1 2 14:21 ? 00:00:32 /usr/lib/jvm/java/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.6.5.0-292 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.5.0-292 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-dev-hdfs-restore-clstr-m0.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051421 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051421 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051421 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode hdfs 13195 1 0 14:22 ? 00:00:07 /usr/lib/jvm/java/bin/java -Dproc_secondarynamenode -Xmx1024m -Dhdp.version=2.6.5.0-292 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.6.5.0-292 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-dev-hdfs-restore-clstr-m0.log -Dhadoop.home.dir=/usr/hdp/2.6.5.0-292/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.6.5.0-292/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051422 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051422 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=128m -XX:MaxNewSize=128m -Xloggc:/var/log/hadoop/hdfs/gc.log-201906051422 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-secondarynamenode/bin/kill-secondary-name-node" -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode root 21043 120868 0 14:46 pts/0 00:00:00 grep --color=auto -i NameNode [root@dev-hdfs-restore-clstr-m0 sshhdfsuser]#
There is no something like "jmx_prometheus_javaagent" in above.
I have changed the permission of my "/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar" and "namenode.yml" to "777"
my namenode.yml is,
--- startDelaySeconds: 0 ssl: false lowercaseOutputName: true lowercaseOutputLabelNames: true whitelistObjectNames: - 'Hadoop:service=NameNode,name=*' - 'Hadoop:service=NameNode,name=MetricsSystem,sub=*' blacklistObjectNames: - 'Hadoop:service=NameNode,name=RetryCache.NameNodeRetryCache' - 'Hadoop:service=NameNode,name=RpcActivity*' - 'Hadoop:service=NameNode,name=RpcDetailedActivity*' - 'Hadoop:service=NameNode,name=UgiMetrics' rules: # MetricsSystem - pattern: 'Hadoop<service=(.*), name=MetricsSystem, sub=(.*)><>(.*): (\d+)' attrNameSnakeCase: true name: hadoop_$1_$3 value: $4 labels: service: HDFS role: $1 kind: 'MetricsSystem' sub: $2 type: GAUGE # All NameNode infos - pattern: 'Hadoop<service=(.*), name=(.*)><>(.*): (\d+)' attrNameSnakeCase: true name: hadoop_$1_$3 value: $4 labels: service: HDFS role: $1 kind: $2 type: GAUGE
I am not sure what I am doing wrong here???
Created 06-06-2019 12:09 AM
Ambari Uses Jinja templates to create the files using the templates. Jinja has specific rules for variable substitution and is very strict about the missing quotes.
Your template has few issues like it has incorrectly entered new line characters at many places when {{VARIABLE}} is occurring in your template ... then it is converted to
{ {VARIABLE}}
Example: (in your case)
export JAVA_HOME={ {java_home}}
Ideally it should be
export JAVA_HOME={{java_home}}
Similarly your hadoop-env template has many new lines added in between {{ brackets.
The other main issue is that your last line as missing quotation mark
#This is Rahul export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} -javaagent:/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar=19850:/home/sshhdfsuser/namenode.yml
Ideally it should be
#This is Rahul export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} -javaagent:/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar=19850:/home/sshhdfsuser/namenode.yml"
.
.
Created 06-06-2019 12:11 AM
May be you can try with the following kind of template: (Please see the attached file.)
Created on 06-06-2019 06:18 PM - edited 08-17-2019 03:06 PM
After updating advanced hadoop-env.sh with the txt file you posted. My cluster refuses to start,
If I check in the logs, Please find my attached log file.
I am concerned about the following block,
Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386) at sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401) Caused by: java.lang.IllegalArgumentException: Collector already registered that provides name: jmx_exporter_build_info at io.prometheus.jmx.shaded.io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:54) at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:139) at io.prometheus.jmx.shaded.io.prometheus.client.Collector.register(Collector.java:132) at io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:51)
Created 06-06-2019 06:37 PM
So after doing lot of research I got a success for namenode exporter,
Added following lines of code, at the end of file and got response on curl localhost:19850
if [ "$command" == "namenode" ]; then export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} -javaagent:/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar=19850:/home/sshhdfsuser/namenode.yml" fi
Everything starts and runs smoothly.
Now when I try to do the same for Datanode, like adding the following
if [ "$command" == "datanode" ]; then export HADOOP_DATANODE_OPTS="${HADOOP_DATANODE_OPTS} -javaagent:/home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar=19851:/home/sshhdfsuser/datanode.yml" fi
Nothing starts and I get error messages in logs as per attached log file.
Following line is the concern I think and I am not sure, why behaviour for datanode is different than the namenode.
Error opening zip file or JAR manifest missing : /home/sshhdfsuser/jmx_prometheus_javaagent-0.11.0.jar
Created 06-27-2019 08:47 AM
use the Ambari UI / APIs , Like "Advanced hadoop-env" from ambari. Added following lines of code, at the end of file
# Add java-agent to get jmx metrics for prometheus
agent_namenode=`echo $HADOOP_NAMENODE_OPTS | grep javaagent | wc -l `
if [ "$agent_namenode" == 0 ]; then
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=10010 -javaagent:/usr/hdp/2.3.4.0-3485/hadoop/lib/jmx_prometheus_javaagent-0.11.0.jar=9998:/usr/hdp/2.3.4.0-3485/hadoop/jmx_exporter/namenode.yaml $HADOOP_NAMENODE_OPTS"
fi
agent_datanode=`echo $HADOOP_DATANODE_OPTS | grep javaagent | wc -l `
if [ "$agent_datanode" == 0 ]; then
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=10011 -javaagent:/usr/hdp/2.3.4.0-3485/hadoop/lib/jmx_prometheus_javaagent-0.11.0.jar=9999:/usr/hdp/2.3.4.0-3485/hadoop/jmx_exporter/datanode.yaml $HADOOP_DATANODE_OPTS"
fi
changed the permission of jmx_prometheus_javaagent-0.11.0.jar" and "namenode.yml" and "datanode.yaml" to "777" and put it into the right place.
restart namenode and datanode at a time. you will get what you want