About deepak.subhramanian

deepak.subhramanian · ‎09-12-2017

If you want to use the Spark on HDP you will have to go through Yarn as Yarn acts as the resource manager to allocate resource (CPU and memory). When you install Spark on HDP , Spark Master is not automatically started. But when you submit Spark through Yarn, Yarn creates the Application Master which acts as the Spark Master.

deepak.subhramanian · ‎11-15-2016

Thanks @azeltov . Even if we create a kerberos token for zeppelin, how does the kerberos tokens for individual users is passed ? All the access to HDFS, Spark and Hive is managed in Ranger for AD user or group and not for Zeppelin user.

deepak.subhramanian · ‎11-10-2016

Thanks @azeltov. To confirm. Does it also work with a Kerberized Cluster ?I am just wondering how the kerberos information is passed.

deepak.subhramanian · ‎11-10-2016

We are looking at setting up Zeppelin on top of Livy Server. Does the following settings pass also the kerberos authentication information. <property> <name>hadoop.proxyuser.livy.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livy.hosts</name> <value>*</value> </property> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/install-livy.html

deepak.subhramanian · ‎10-13-2016

@Dan Zaratsian That worked. Thanks a lot.

deepak.subhramanian · ‎10-13-2016

Thanks Dan. It works in our dev environment which is on python 2.6.6. When I expanded the graphframes jar and ran pyspark from graphframes directory I was getting the "Bad magic number" error which relates to version mismatch. But since it worked in our dev environment which is 2.6 I think it is possible to get it working with 2.6. I am not sure --packages option did something extra to the python packages after downloading to make it working with python 2.6 We are looking at getting Anaconda in our cluster . But it will take time to upgrade as some process is involved in the production environment to make sure the python upgrade doesn't affect ambari and the cluster. from graphframes import * Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: Bad magic number in graphframes/__init__.pyc https://community.hortonworks.com/questions/9368/ambari-server-install-requires-python26.html

deepak.subhramanian · ‎10-13-2016

We are trying to use graphframes package with pyspark. For some reason it doesn't work in our production environment. In our dev environment it works as we can use --packages options and it downloads the libraries from external repository. We cannot use packages option in production as it is not connected to the internet. It works with scala in production. The default python version is 2.6.6 and hdp version is 2.4.2 pyspark --packages graphframes:graphframes:0.2.0-spark1.6-s_2.10 I copied the all the jars downloaded with --packages option in dev and passed it as parameter to --jars in pyspark command in production. But it doesn't work. The same commands work in dev and spark on my mac. pyspark --py-files /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar --jars /tmp/thirdpartyjars/graphframes_graphframes-0.2.0-spark1.6-s_2.10.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,/tmp/thirdpartyjars/com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,/tmp/thirdpartyjars/org.scala-lang_scala-reflect-2.10.4.jar,/tmp/thirdpartyjars/org.slf4j_slf4j-api-1.7.7.jar Console log Using Python version 2.6.6 (r266:84292, May 22 2015 08:34:51) SparkContext available as sc, HiveContext available as sqlContext. >>> >>> from graphframes import * Traceback (most recent call last): File "<stdin>", line 1, in <module> zipimport.ZipImportError: can't find module 'graphframes'

deepak.subhramanian · ‎10-07-2016

Good point. It looks like it is a firewall issue.

deepak.subhramanian · ‎10-05-2016

I missed copying the spark conf. But spark user was created with yum install of spark clients. I also copied the keytab. But still the same error. 16/10/05 20:05:55 ERROR ApplicationMaster: Failed to connect to driver at 10.100.100.110:33656, retrying ... 16/10/05 20:06:58 ERROR ApplicationMaster: Failed to connect to driver at 10.100.100.110:33656, retrying ... 16/10/05 20:06:58 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!

deepak.subhramanian · ‎10-05-2016

I am trying to evaluate sparklyr in a test machine with RStudio Server. Since the machine is outside the HDP cluster I installed the hadoop and spark clients and copied the config files our test hdp cluster in /etc/hadoop/conf . I set the HADOOP_CONF_DIR and YARN_CONF_DIR and SPARK_HOME to point to hdp files. Our hadoop cluster is integrated with Kerberos. I am able to run spark-shell on local mode and read hdfs files from the test cluster. I am not able to run spark-shell on yarn-client mode . I am getting the following error in application log. 16/10/05 11:30:57 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 16/10/05 11:32:00 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10.100.99.100:42948, retrying ... 16/10/05 11:33:03 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10.100.99.100:42948, retrying ... 16/10/05 11:33:03 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! It is submitting the job and the job goto ACCEPTED state but not to RUNNING state. 6/10/05 10:43:24 INFO impl.YarnClientImpl: Submitted application application_1474880908029_0858 16/10/05 10:43:24 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1474880908029_0858 and attemptId None 16/10/05 10:43:25 INFO yarn.Client: Application report for application_1474880908029_0858 (state: ACCEPTED) 16/10/05 10:43:25 INFO yarn.Client: client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1475660604154 final status: UNDEFINED tracking URL: http://hostname:8088/proxy/application_1474880908029_0858/ user: dee 16/10/05 10:43:26 INFO yarn.Client: Application report for application_1474880908029_0858 (state: ACCEPTED) 16/10/05 10:43:27 INFO yarn.Client: Application report for application_1474880908029_0858 (state: ACCEPTED) 16/10/05 10:43:28 INFO yarn.Client: Application report for application_1474880908029_0858 (state: ACCEPTED) 16/10/05 10:43:29 INFO yarn.Client: Application report for application_1474880908029_0858 (state: ACCEPTED) Here is the application log. 16/10/05 11:30:57 INFO spark.SecurityManager: Changing view acls to: deesub 16/10/05 11:30:57 INFO spark.SecurityManager: Changing modify acls to: deesub 16/10/05 11:30:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(deesub); users with modify permissions: Set(deesub) 16/10/05 11:30:57 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 16/10/05 11:32:00 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10.100.99.100:42948, retrying ... 16/10/05 11:33:03 ERROR yarn.ApplicationMaster: Failed to connect to driver at 10.100.99.100:42948, retrying ... 16/10/05 11:33:03 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:501) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:362) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:204) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672) at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 16/10/05 11:33:03 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 16/10/05 11:33:03 INFO util.ShutdownHookManager: Shutdown hook called

Online	Offline
Last Visited	‎10-28-2013 12:04 PM

Member Since	‎08-31-2013 01:17 AM
Last Visited	‎10-28-2013 12:04 PM
Posts	24
Kudos received	5

Cloudera Community

Re: Connect HDP 2.4 Spark remotely failed

Re: Is Zeppelin in HDP 2.5 support multi-tenancy o...

Re: Is Zeppelin in HDP 2.5 support multi-tenancy o...

Is Zeppelin in HDP 2.5 support multi-tenancy on a ...

Re: Graphframes with pyspark

Re: Graphframes with pyspark

Graphframes with pyspark

Re: Issue with setting up spark clients without am...

Re: Issue with setting up spark clients without am...

Issue with setting up spark clients without ambari