Member since
01-24-2017
69
Posts
2
Kudos Received
0
Solutions
03-02-2017
02:13 PM
Same applies for spark-submit. It works locally but not on the cluster: spark-submit --master=yarn --deploy-mode=client lettercount.py works but spark-submit --master=yarn --deploy-mode=cluster lettercount.py crashes: ======== ... client token: N/A diagnostics: Application application_1488351192359_0019 failed 2 times due to AM Container for appattempt_1488351192359_0019_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://md02.rcc.local:8088/proxy/application_1488351192359_0019/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1488351192359_0019_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ... ===========
... View more
03-02-2017
02:08 PM
Actually, I might have misinterpreted it: it works locally but not on the cluster ======== [ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=cluster Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Error: Cluster deploy mode is not applicable to Spark shells. Run with --help for usage help or --verbose for debug output Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module> sc = SparkContext(pyFiles=add_files) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number >>> ======= [ivy2@md01 lab7_Spark]$ pyspark --master=yarn --deploy-mode=client Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 17/03/02 16:04:32 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Python version 2.7.5 (default, Nov 20 2015 02:00:19) SparkContext available as sc, HiveContext available as sqlContext. ====== [ivy2@md01 lab7_Spark]$ pyspark --master=local Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 17/03/02 16:06:40 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Python version 2.7.5 (default, Nov 20 2015 02:00:19) SparkContext available as sc, HiveContext available as sqlContext. ====== I think it worked before I configured the cluster to use TLS.
... View more
03-02-2017
12:57 PM
Forgot to say that spark-submit --master=yarn --num-executors=3 lettercount.py works fine.
... View more
03-02-2017
12:55 PM
If one run pyspark without arguments on the gateway node on which CM is installed, one gets: ========================= $ pyspark Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Error: Cluster deploy mode is not applicable to Spark shells. Run with --help for usage help or --verbose for debug output Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module> sc = SparkContext(pyFiles=add_files) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number ========================= However, it works on the cluster: ========================= [ivy2@md01 ~]$ pyspark --master=yarn Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 17/03/02 14:54:04 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Python version 2.7.5 (default, Nov 20 2015 02:00:19) SparkContext available as sc, HiveContext available as sqlContext. >>> ========================= Any ideas why? Is this warning OK: "WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. "?
... View more
Labels:
- Labels:
-
Apache Spark
03-02-2017
12:20 PM
I have tested various Hadoop functionality running from a command line. Mostly things are working unless you try to run pig or pyspark on a local machine: 1) hdfs commands seem to be working 2) mapreduce is working 3) pig is working when submitting a job to the cluster but runs out of memory when submitting a tiny job to a local machine with pig -x local 2017-03-02 13:48:36,124 [Thread-20] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1306469224_0004 java.lang.Exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:987) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-03-02 13:48:42,128 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2017-03-02 13:48:42,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1306469224_0004 has failed! Stop running all dependent jobs 4) I can submit a spark job to a cluster with spark-submit --master=yarn --num-executors=3 lettercount.py but pyspark without arguments crashes: [ivy2@md01 lab7_Spark]$ pyspark Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Error: Cluster deploy mode is not applicable to Spark shells. Run with --help for usage help or --verbose for debug output Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module> sc = SparkContext(pyFiles=add_files) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number On the other hand, it works with --master=yarn: [ivy2@md01 lab7_Spark]$ pyspark --master=yarn Python 2.7.5 (default, Nov 20 2015, 02:00:19) [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Python version 2.7.5 (default, Nov 20 2015 02:00:19) SparkContext available as sc, HiveContext available as sqlContext. 5) hive works 6) HBase works Whay pyspark and pig misbehave on a local node but are OK when running on the cluster?
... View more
03-02-2017
10:04 AM
"When troubleshooting these sort of problems, it is important to define the problem very clearly. To do so, we will need to understand what client/server communication is failing." Hadoop itself seems to be happy. None of Hadoop components report any problems. All the heartbeats are under 15s. What is not working are Cloudera Management Services. All of them are in red or unknown state if you go into Cloudera Management Services. Also, on the CM front page there are two messages: Request to the Service Monitor failed. This may cause slow page responses. View the status of the Service Monitor. Request to the Host Monitor failed. This may cause slow page responses. View the status of the Host Monitor.
... View more
03-02-2017
09:18 AM
Ben, I do not remember if I mentioned it, but I do not have intermediate certificates. So in the latest instructions, whenever intermediate certificates were mentioned, I used root certificate. Is that OK? Thank you, Igor
... View more
03-02-2017
09:16 AM
Hi Ben, I just tried importing all the agent certificates and (just in case, can it hurt?) server certificate into the truststore and distributing it accross the nodes, restarting server, agents, Cloudera Management services. It did not solve my problems. Most of the Cloudera Management Services are still down. Also note in the 5.9 link you just sent me: "Important: Only perform this step if your Agent certificates have not been enabled for TLS Web Client Authentication. See Step 4 for instructions on how to examine Agent certificates. " By "this step" they mean importing agent certificates into the truststore. But the latest instructions did require enabling TLS Web Client Authentication. In my agent certificates I have the corresponding line: " openssl x509 -text -in md01.rcc.local-agent.pem .... Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Subject Alternative Name: DNS:hadoop-md01.rcc.uchicago.edu, DNS:md01.rcc.local .... " Any other ideas? What does enabling "TLS Web Server Authentication, TLS Web Client Authentication" do exactly? I still do not understand how agents are supposed to authenticate to CM if one does not put agent certificates into the truststore. Might it be necessary to append root certificate to each agent certificate pem file before importing into the truststore? Might there be some problems with permissions or ownership here (for example, I would assume that at least keys should be readable only by u, maybe g, but certainly not o? [root@md01 try2]# ls -l /opt/cloudera/security/pki total 64 lrwxrwxrwx 1 cloudera-scm cloudera-scm 51 Feb 27 20:57 agent.cert.pem -> /opt/cloudera/security/pki/md01.rcc.local-agent.pem lrwxrwxrwx 1 cloudera-scm cloudera-scm 51 Feb 27 20:57 agent.jks -> /opt/cloudera/security/pki/md01.rcc.local-agent.jks lrwxrwxrwx 1 cloudera-scm cloudera-scm 51 Feb 28 14:44 agent.key -> /opt/cloudera/security/pki/md01.rcc.local-agent.key -rw-r--r-- 1 cloudera-scm cloudera-scm 1241 Feb 28 23:40 md01.rcc.local-agent.csr -rw-r--r-- 1 cloudera-scm cloudera-scm 4295 Feb 28 23:40 md01.rcc.local-agent.jks -rw-r--r-- 1 cloudera-scm cloudera-scm 1991 Mar 1 00:05 md01.rcc.local-agent.key -rw-r--r-- 1 cloudera-scm cloudera-scm 5008 Mar 1 00:05 md01.rcc.local-agent.p12 -rw-r--r-- 1 cloudera-scm cloudera-scm 8394 Feb 28 23:40 md01.rcc.local-agent.pem -rw-r--r-- 1 cloudera-scm cloudera-scm 1195 Feb 28 23:45 md01.rcc.local-server.csr -rw-r--r-- 1 cloudera-scm cloudera-scm 4263 Feb 28 23:45 md01.rcc.local-server.jks -rw-r--r-- 1 cloudera-scm cloudera-scm 8232 Feb 28 23:45 md01.rcc.local-server.pem -rw-r--r-- 1 cloudera-scm cloudera-scm 2175 Feb 28 23:40 rootca.cert.pem [root@md01 try2]# ls -l /etc/cloudera-scm-agent/agentkey.pw -r--r----- 1 root root 13 Mar 1 00:16 /etc/cloudera-scm-agent/agentkey.pw (this file is supposed to contain keystore, not truststore password, right?) [root@md01 try2]# ls -l $JAVA_HOME/jre/lib/security/jssecacerts -rw-r--r-- 1 root root 126218 Mar 2 10:51 /usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts Thank you, Igor
... View more
03-02-2017
08:34 AM
I have configured all the 3 levels. But since the documentation did not say that I should import agent certificates, I did not. And, probably as a result of that, most of the Cloudera Management Services are down. I'll try to add all the agent certificates and see if it helps. So to double check, for each agent certificate I should do: keytool -importcert -alias md01.rcc.local-agent -keystore $JAVA_HOME/jre/lib/security/jssecacerts -file md01.rcc.local-agent.pem -storepass <..> ? Only jssecacerts on CM server needs to have agent certificates or all the nodes? Do the nodes need CM server certificates?
... View more
03-02-2017
07:38 AM
"3. Both of the above are required, and Cloudera Manager needs to be configured to read a truststore containing all the agents' public certificates. (CM validates the agents' certificates) " The documentation (https://www.cloudera.com/documentation/enterprise/latest/topics/how_to_configure_cm_tls.html#concept_wk4_jlx_qw ) only says to put rootcert into jssecacerts, it does not say to put agent public keys there. Should the documentation be fixed?
... View more
- « Previous
-
- 1
- 2
- Next »