Member since
03-23-2016
21
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5731 | 08-12-2016 12:05 PM |
10-14-2016
08:56 AM
1 Kudo
Hi All Thank you for your replies: submit command: spark-submit --master yarn-client --properties-file
${MY_CONF_DIR}/prediction.properties \ --driver-memory 6G \
--executor-memory 10G \ --num-executors 5 \ --executor-cores 13 \
--class com.comp.bdf.nat.applications.$1 \ --jars ${MY_CLASSPATH} \
${MY_LIB_DIR}/prediction.jar $PHASE "$ARG_COMPL" "${PARAMETERS[@]}" No yarn problems were detected. here are some screenshots thet I got on Spark UI: as I mentioned before te same stage (773) is keeping in "RUNNING STATE" and always on the same node note that this node was recently added to the cluster, could it be a problem of versions? org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
com.vsct.sncf.nat.outils.AppOutils$anonfun$sauverDf$1$anonfun$apply$1.apply(AppOutils.scala:506)
com.vsct.sncf.nat.outils.AppOutils$anonfun$sauverDf$1$anonfun$apply$1.apply(AppOutils.scala:498)
com.vsct.sncf.nat.outils.AppOutils$.remplacerDf(AppOutils.scala:483)
com.vsct.sncf.nat.applications.CreerPrediction$.lancer(CreerPrediction.scala:97)
com.vsct.sncf.nat.applications.ApplicationNAT.main(ApplicationNAT.scala:78)
com.vsct.sncf.nat.applications.CreerPrediction.main(CreerPrediction.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731)
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) When I click on link: save at AppOutils.scala:506
... View more
10-12-2016
08:08 AM
hello I'll check that on next run Thank you
... View more
10-11-2016
03:55 PM
I have new elements: Jobs were killed by a developer beause it was running for 12 hours I found that a same task of a job is Hanging (in RUNNING state untill killing the whole job) on a same node. the task is somotimes hanging and other time suceeding What could make a task hanging that way? thank you
... View more
10-11-2016
10:20 AM
1 Kudo
hi community, I have spark job (spark job run on yarn) that failed with following error: "stage cancelled because SparkContext was shut down" after job failing a slowness was noticed on following jobs Have you an idea what could be the reason? how can I link spark job number to yarn applicationID? where can I find logs of the failed job? Thank you
... View more
Labels:
- Labels:
-
Apache Spark
08-12-2016
12:05 PM
1 Kudo
I found the cause of the problem. it's configuration matter. in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) : dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070 the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02. hope I'm clear 🙂
... View more
08-12-2016
09:00 AM
in the file: /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out ulimit -a for user hdfs
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257395
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 128000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) 100000
... View more
08-12-2016
08:53 AM
@emaxwell : it's launched with root
... View more
08-12-2016
08:53 AM
@Joy: here is the stdout: 2016-08-09 13:36:36,469 - Group['hadoop'] {'ignore_failures': False}
2016-08-09 13:36:36,470 - Group['users'] {'ignore_failures': False}
2016-08-09 13:36:36,470 - Group['spark'] {'ignore_failures': False}
2016-08-09 13:36:36,471 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,471 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-08-09 13:36:36,472 - User['flume'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,472 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,473 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,474 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,474 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,475 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['users']}
2016-08-09 13:36:36,476 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,476 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,477 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,477 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,478 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,478 - User['ams'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': ['hadoop']}
2016-08-09 13:36:36,479 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-09 13:36:36,480 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-08-09 13:36:36,492 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2016-08-09 13:36:36,493 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2016-08-09 13:36:36,494 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-08-09 13:36:36,494 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2016-08-09 13:36:36,506 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2016-08-09 13:36:36,507 - Group['hdfs'] {'ignore_failures': False}
2016-08-09 13:36:36,507 - User['hdfs'] {'ignore_failures': False, 'groups': ['hadoop', 'hdfs']}
2016-08-09 13:36:36,508 - Directory['/etc/hadoop'] {'mode': 0755}
2016-08-09 13:36:36,521 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-08-09 13:36:36,535 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2016-08-09 13:36:36,548 - Skipping Execute[('setenforce', '0')] due to not_if
2016-08-09 13:36:36,548 - Directory['/var/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,550 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,550 - Changing owner for /var/run/hadoop from 496 to root
2016-08-09 13:36:36,551 - Changing group for /var/run/hadoop from 1002 to root
2016-08-09 13:36:36,551 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,555 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,557 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-08-09 13:36:36,565 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,566 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2016-08-09 13:36:36,571 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-08-09 13:36:36,582 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2016-08-09 13:36:36,804 - Directory['/etc/security/limits.d'] {'owner': 'root', 'group': 'root', 'recursive': True}
2016-08-09 13:36:36,809 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2016-08-09 13:36:36,810 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,820 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2016-08-09 13:36:36,821 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,829 - Writing File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] because contents don't match
2016-08-09 13:36:36,829 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,838 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2016-08-09 13:36:36,838 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,843 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] because contents don't match
2016-08-09 13:36:36,844 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2016-08-09 13:36:36,844 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,853 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2016-08-09 13:36:36,854 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,859 - Writing File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] because contents don't match
2016-08-09 13:36:36,859 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,868 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2016-08-09 13:36:36,868 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,874 - Writing File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] because contents don't match
2016-08-09 13:36:36,874 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-08-09 13:36:36,883 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2016-08-09 13:36:36,883 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,928 - Writing File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] because contents don't match
2016-08-09 13:36:36,929 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {}, 'owner': 'hdfs', 'configurations': ...}
2016-08-09 13:36:36,938 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2016-08-09 13:36:36,939 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2016-08-09 13:36:36,956 - Writing File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] because contents don't match
2016-08-09 13:36:36,958 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2016-08-09 13:36:36,959 - Directory['/data01/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'cd_access': 'a', 'group': 'hadoop', 'recursive': True, 'mode': 0755}
2016-08-09 13:36:36,959 - Directory['/data02/hadoop/hdfs/namenode'] {'owner': 'hdfs', 'recursive': True, 'group': 'hadoop', 'mode': 0755, 'cd_access': 'a'}
2016-08-09 13:36:36,960 - Ranger admin not installed
/data01/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted
/data02/hadoop/hdfs/namenode/namenode-formatted/ exists. Namenode DFS already formatted
2016-08-09 13:36:36,960 - Directory['/data01/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True}
2016-08-09 13:36:36,960 - Directory['/data02/hadoop/hdfs/namenode/namenode-formatted/'] {'recursive': True}
2016-08-09 13:36:36,962 - File['/etc/hadoop/conf/dfs.exclude'] {'owner': 'hdfs', 'content': Template('exclude_hosts_list.j2'), 'group': 'hadoop'}
2016-08-09 13:36:36,963 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2016-08-09 13:36:36,963 - Changing owner for /var/run/hadoop from 0 to hdfs
2016-08-09 13:36:36,963 - Changing group for /var/run/hadoop from 0 to hadoop
2016-08-09 13:36:36,963 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-08-09 13:36:36,963 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'recursive': True}
2016-08-09 13:36:36,964 - File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
2016-08-09 13:36:36,982 - Deleting File['/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid']
2016-08-09 13:36:36,982 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid'}
... View more
08-11-2016
11:08 AM
1 Kudo
Hello Community Ambari is not able to start namenode: in fact, it's not able to execute commande 'ambari-sudo.sh su hdfs -l -s /bin/bash -c ...'. When I try to execute the whole command manually, I'm asked to enter password. Folowing is the stderr.out has anyone an idea about what cound be the reason? Thank you Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 317, in <module>
NameNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py", line 82, in start
namenode(action="start", rolling_restart=rolling_restart, env=env)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py", line 86, in namenode
create_log_dir=True
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 276, in service
environment=hadoop_env_exports
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start namenode'' returned 1. starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-master01.cl02.sr.private.out
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop