Support Questions

Find answers, ask questions, and share your expertise

sparkR connect to hadoop cluster

avatar
Master Collaborator

Hi:

I am trying to SparkR but doesnt work well

the code is:

Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))
library(SparkR)

sc <- SparkR::sparkR.init(master = "yarn-client") 
sqlContext <-sparkRSQL.init(sc)
path <-file.path("/RSI/staging/input/log_json/f6327t.json")

info <-read.json(sqlContext, path)

printSchema(info)

and the log is:

> sc <- SparkR::sparkR.init(master = "yarn-client")
Launching java with spark-submit command /usr/hdp/current/spark-client//bin/spark-submit   sparkr-shell /tmp/RtmpxnCWXx/backend_port502d157a15ac 
16/05/05 16:33:22 INFO SparkContext: Running Spark version 1.6.0
16/05/05 16:33:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/05 16:33:23 INFO SecurityManager: Changing view acls to: bigotes
16/05/05 16:33:23 INFO SecurityManager: Changing modify acls to: bigotes
16/05/05 16:33:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bigotes); users with modify permissions: Set(bigotes)
16/05/05 16:33:23 INFO Utils: Successfully started service 'sparkDriver' on port 39914.
16/05/05 16:33:23 INFO Slf4jLogger: Slf4jLogger started
16/05/05 16:33:23 INFO Remoting: Starting remoting
16/05/05 16:33:24 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.1.246.19:55278]
16/05/05 16:33:24 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 55278.
16/05/05 16:33:24 INFO SparkEnv: Registering MapOutputTracker
16/05/05 16:33:24 INFO SparkEnv: Registering BlockManagerMaster
16/05/05 16:33:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-fc4a72de-f470-4c3c-9692-bcf941a4b674
16/05/05 16:33:24 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/05 16:33:24 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/05 16:33:24 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/05 16:33:24 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/05/05 16:33:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/05 16:33:24 INFO SparkUI: Started SparkUI at http://10.1.246.19:4040
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
16/05/05 16:33:24 INFO TimelineClientImpl: Timeline service address: http://lnxbig06.cajarural.gcr:8188/ws/v1/timeline/
16/05/05 16:33:25 INFO RMProxy: Connecting to ResourceManager at lnxbig05.cajarural.gcr/10.1.246.19:8050
16/05/05 16:33:25 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/05/05 16:33:25 INFO Client: Requesting a new application from cluster with 5 NodeManagers
16/05/05 16:33:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (40192 MB per container)
16/05/05 16:33:25 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/05 16:33:25 INFO Client: Setting up container launch context for our AM
16/05/05 16:33:25 INFO Client: Setting up the launch environment for our AM container
16/05/05 16:33:25 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
16/05/05 16:33:25 INFO Client: Preparing resources for our AM container
16/05/05 16:33:25 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
16/05/05 16:33:25 INFO Client: Source and destination file systems are the same. Not copying hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
16/05/05 16:33:25 INFO Client: Uploading resource file:/tmp/spark-7c7224cd-1fa8-43d6-b049-a85ce21f18e7/__spark_conf__5347166147727015442.zip -> hdfs://lnxbig05.cajarural.gcr:8020/user/bigotes/.sparkStaging/application_1461739406783_0151/__spark_conf__5347166147727015442.zip
16/05/05 16:33:26 INFO SecurityManager: Changing view acls to: bigotes
16/05/05 16:33:26 INFO SecurityManager: Changing modify acls to: bigotes
16/05/05 16:33:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bigotes); users with modify permissions: Set(bigotes)
16/05/05 16:33:26 INFO Client: Submitting application 151 to ResourceManager
16/05/05 16:33:26 INFO YarnClientImpl: Submitted application application_1461739406783_0151
16/05/05 16:33:26 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1461739406783_0151 and attemptId None
16/05/05 16:33:27 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:27 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1462458806216
	 final status: UNDEFINED
	 tracking URL: http://lnxbig05.cajarural.gcr:8088/proxy/application_1461739406783_0151/
	 user: bigotes
16/05/05 16:33:28 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:29 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:30 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:31 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:32 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:33 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:34 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:35 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:36 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:37 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:38 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:39 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:40 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
16/05/05 16:33:41 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)

Is correct my code???

Thanks

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi:

finally its working with this code:

Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))
library(SparkR)
#sparkR.stop()
sparkR.stop()
sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.driver.memory="4g")) 
hiveContext <- sparkRHive.init(sc)

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Looks like you are running the code as 'bigotes' user. Can you check if that is correct and you have sufficient write privileges in the user directory?

avatar
Master Collaborator

Hi:

finally its working with this code:

Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths()))
library(SparkR)
#sparkR.stop()
sparkR.stop()
sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.driver.memory="4g")) 
hiveContext <- sparkRHive.init(sc)