Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
sparkR connect to hadoop cluster
Labels:
- Labels:
-
Apache Spark
Master Collaborator
Created ‎05-05-2016 02:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
I am trying to SparkR but doesnt work well
the code is:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths())) library(SparkR) sc <- SparkR::sparkR.init(master = "yarn-client") sqlContext <-sparkRSQL.init(sc) path <-file.path("/RSI/staging/input/log_json/f6327t.json") info <-read.json(sqlContext, path) printSchema(info)
and the log is:
> sc <- SparkR::sparkR.init(master = "yarn-client") Launching java with spark-submit command /usr/hdp/current/spark-client//bin/spark-submit sparkr-shell /tmp/RtmpxnCWXx/backend_port502d157a15ac 16/05/05 16:33:22 INFO SparkContext: Running Spark version 1.6.0 16/05/05 16:33:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/05 16:33:23 INFO SecurityManager: Changing view acls to: bigotes 16/05/05 16:33:23 INFO SecurityManager: Changing modify acls to: bigotes 16/05/05 16:33:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bigotes); users with modify permissions: Set(bigotes) 16/05/05 16:33:23 INFO Utils: Successfully started service 'sparkDriver' on port 39914. 16/05/05 16:33:23 INFO Slf4jLogger: Slf4jLogger started 16/05/05 16:33:23 INFO Remoting: Starting remoting 16/05/05 16:33:24 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.1.246.19:55278] 16/05/05 16:33:24 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 55278. 16/05/05 16:33:24 INFO SparkEnv: Registering MapOutputTracker 16/05/05 16:33:24 INFO SparkEnv: Registering BlockManagerMaster 16/05/05 16:33:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-fc4a72de-f470-4c3c-9692-bcf941a4b674 16/05/05 16:33:24 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 16/05/05 16:33:24 INFO SparkEnv: Registering OutputCommitCoordinator 16/05/05 16:33:24 INFO Server: jetty-8.y.z-SNAPSHOT 16/05/05 16:33:24 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/05/05 16:33:24 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/05/05 16:33:24 INFO SparkUI: Started SparkUI at http://10.1.246.19:4040 spark.yarn.driver.memoryOverhead is set but does not apply in client mode. 16/05/05 16:33:24 INFO TimelineClientImpl: Timeline service address: http://lnxbig06.cajarural.gcr:8188/ws/v1/timeline/ 16/05/05 16:33:25 INFO RMProxy: Connecting to ResourceManager at lnxbig05.cajarural.gcr/10.1.246.19:8050 16/05/05 16:33:25 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 16/05/05 16:33:25 INFO Client: Requesting a new application from cluster with 5 NodeManagers 16/05/05 16:33:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (40192 MB per container) 16/05/05 16:33:25 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/05/05 16:33:25 INFO Client: Setting up container launch context for our AM 16/05/05 16:33:25 INFO Client: Setting up the launch environment for our AM container 16/05/05 16:33:25 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/05/05 16:33:25 INFO Client: Preparing resources for our AM container 16/05/05 16:33:25 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/05/05 16:33:25 INFO Client: Source and destination file systems are the same. Not copying hdfs://lnxbig05.cajarural.gcr:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/05/05 16:33:25 INFO Client: Uploading resource file:/tmp/spark-7c7224cd-1fa8-43d6-b049-a85ce21f18e7/__spark_conf__5347166147727015442.zip -> hdfs://lnxbig05.cajarural.gcr:8020/user/bigotes/.sparkStaging/application_1461739406783_0151/__spark_conf__5347166147727015442.zip 16/05/05 16:33:26 INFO SecurityManager: Changing view acls to: bigotes 16/05/05 16:33:26 INFO SecurityManager: Changing modify acls to: bigotes 16/05/05 16:33:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(bigotes); users with modify permissions: Set(bigotes) 16/05/05 16:33:26 INFO Client: Submitting application 151 to ResourceManager 16/05/05 16:33:26 INFO YarnClientImpl: Submitted application application_1461739406783_0151 16/05/05 16:33:26 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1461739406783_0151 and attemptId None 16/05/05 16:33:27 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:27 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1462458806216 final status: UNDEFINED tracking URL: http://lnxbig05.cajarural.gcr:8088/proxy/application_1461739406783_0151/ user: bigotes 16/05/05 16:33:28 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:29 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:30 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:31 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:32 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:33 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:34 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:35 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:36 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:37 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:38 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:39 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:40 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED) 16/05/05 16:33:41 INFO Client: Application report for application_1461739406783_0151 (state: ACCEPTED)
Is correct my code???
Thanks
1 ACCEPTED SOLUTION
Master Collaborator
Created ‎05-05-2016 07:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
finally its working with this code:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths())) library(SparkR) #sparkR.stop() sparkR.stop() sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.driver.memory="4g")) hiveContext <- sparkRHive.init(sc)
2 REPLIES 2
Super Collaborator
Created ‎05-05-2016 06:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like you are running the code as 'bigotes' user. Can you check if that is correct and you have sufficient write privileges in the user directory?
Master Collaborator
Created ‎05-05-2016 07:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
finally its working with this code:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client/") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"),.libPaths())) library(SparkR) #sparkR.stop() sparkR.stop() sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.driver.memory="4g")) hiveContext <- sparkRHive.init(sc)
