About joncodin

joncodin · ‎05-25-2016

Im executing a query on spark and it is working Im getting the result. I did not configure any cluster so spark should be using its own cluster manager. But in the spark page: master:8080 I get this: Alive Workers: 2 Cores in use: 4 Total, 0 Used Memory in use: 6.0 GB Total, 0.0 B Used Applications: 0 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE But when Im executing the query I get the same result while Im refresinh the page: Alive Workers: 2 Cores in use: 4 Total, 0 Used Memory in use: 6.0 GB Total, 0.0 B Used Applications: 0 Running, 0 Completed Drivers: 0 Running, 0 Completed Status: ALIVE And after the execution of the query this is the same again...Do you know why? Its very strange, it seems that spark is executing the query without using any hardware which is not possible, so why this info is not updating do you know?

joncodin · ‎05-19-2016

Hi, Im testing on hadoop 2-7.1, java 1.8 and spark-1.6.1-bin-hadoop2.6.

joncodin · ‎05-19-2016

Thanks for your help. I tried to start spark with command that you said, but I have the exact same error.

joncodin · ‎05-19-2016

When I start the spark-yarn using this command " spark-shell --master yarn-client " Im getting an error saying: <code>ERROR spark.SparkContext: Error initializing SparkContext. java.lang.NullPointerException The full error I got in starting spark shell with yarn is below, the logs about yarn containers is here: <code>Container: container_1463670715317_0002_01_000001 on masternode_52694 ============================================================================ LogType:stderr Log Upload Time:Thu May 19 16:19:54 +0100 2016 LogLength:5748 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/05/19 16:19:44 INFO yarn.ApplicationMaster: Registered signal handlers for [T ERM, HUP, INT] 16/05/19 16:19:45 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_ 1463670715317_0002_000001 16/05/19 16:19:46 INFO spark.SecurityManager: Changing view acls to: hadoopadmin 16/05/19 16:19:46 INFO spark.SecurityManager: Changing modify acls to: hadoopadm in 16/05/19 16:19:46 INFO spark.SecurityManager: SecurityManager: authentication di sabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users w ith modify permissions: Set(hadoopadmin) 16/05/19 16:19:46 INFO yarn.ApplicationMaster: Waiting for Spark driver to be re achable. 16/05/19 16:19:46 INFO yarn.ApplicationMaster: Driver now available: 10.17.0.50: 43771 16/05/19 16:19:47 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI Filter. AddW ebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_ HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/a pplication_1463670715317_0002),/proxy/application_1463670715317_0002) 16/05/19 16:19:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8030 16/05/19 16:19:47 INFO yarn.YarnRMClient: Registering the ApplicationMaster 16/05/19 16:19:47 INFO yarn.YarnAllocator: Will request 2 executor containers, e ach with 1 cores and 1408 MB memory including 384 MB overhead 16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>) 16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>) 16/05/19 16:19:47 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 16/05/19 16:19:47 INFO impl.AMRMClientImpl: Received new token for : masternode:52694 16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching container container_1463670 715317_0002_01_000002 for on host masternode 16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl : spark://CoarseGrainedScheduler@10.17.0.50:43771, executorHostname: masternode 16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Starting Executor Container 16/05/19 16:19:47 INFO yarn.YarnAllocator: Received 1 containers from YARN, laun ching executors on 1 of them. 16/05/19 16:19:47 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-ca ched-nodemanagers-proxies : 0 16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Preparing Local resources 16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Prepared Local resources Map(_spa rk_.jar -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/user/ hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-ha doop2.6.0.jar" } size: 187698038 timestamp: 1463671182405 type: FILE visibility: PRIVATE) 16/05/19 16:19:48 INFO yarn.ExecutorRunnable: =============================================================================== YARN executor launch context: env: CLASSPATH -> PWD<CPS>PWD/_spark_.jar<CPS>$HADOOP_CONF_DIR<CPS>$HAD OOP_COMMON_HOME/share/hadoop/common/<CPS>$HADOOP_COMMON_HOME/share/hadoop/commo n/lib/<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/<CPS>$HADOOP_HDFS_HOME/share/ha doop/hdfs/lib/<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/<CPS>$HADOOP_YARN_HOME/ share/hadoop/yarn/lib/<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/<CPS>$HA DOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/ SPARK_LOG_URL_STDERR -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stderr?start=-4096 SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1463670715317_0002 SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038 SPARK_USER -> hadoopadmin SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE SPARK_YARN_MODE -> true SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1463671182405 SPARK_LOG_URL_STDOUT -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stdout?start=-4096 SPARK_YARN_CACHE_FILES -> hdfs://localhost:9000/user/hadoopadmin/.sparkStagi ng/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar#_spark_ .jar command: JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -X mx1024m -Djava.io.tmpdir=PWD/tmp '-Dspark.driver.port=43771' -Dspark.yarn.ap p.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBac kend --driver-url spark://CoarseGrainedScheduler@10.17.0.50:43771 --executor-id 1 --hostname masternode --cores 1 --app-id application_1463670715317_0002 - -user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr =============================================================================== 16/05/19 16:19:48 INFO impl.ContainerManagementProtocolProxy: Opening proxy : masternode:52694 16/05/19 16:19:48 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM 16/05/19 16:19:48 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exit Code: 0, (reason: Shutdown hook called before final status was reported.) 16/05/19 16:19:48 INFO util.ShutdownHookManager: Shutdown hook called End of LogType:stderr LogType:stdout Log Upload Time:Thu May 19 16:19:54 +0100 2016 LogLength:0 Log Contents: End of LogType:stdout Container: container_1463670715317_0002_02_000002 on masternode_52694 ============================================================================ LogType:stderr Log Upload Time:Thu May 19 16:19:54 +0100 2016 LogLength:737 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/05/19 16:19:54 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 16/05/19 16:19:54 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM End of LogType:stderr LogType:stdout Log Upload Time:Thu May 19 16:19:54 +0100 2016 LogLength:0 Log Contents: End of LogType:stdout hadoopadmin@master:~$ The full error that it shows when I try to start spark with " spark-shell --master yarn-client ": <code>hadoopadmin@master:~$ spark-shell --master yarn-client 16/05/19 16:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/19 16:19:33 INFO spark.SecurityManager: Changing view acls to: hadoopadmin 16/05/19 16:19:33 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin 16/05/19 16:19:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin) 16/05/19 16:19:33 INFO spark.HttpServer: Starting HTTP Server 16/05/19 16:19:33 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/05/19 16:19:33 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:37052 16/05/19 16:19:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 37052. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.1 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. 16/05/19 16:19:37 INFO spark.SparkContext: Running Spark version 1.6.1 16/05/19 16:19:37 INFO spark.SecurityManager: Changing view acls to: hadoopadmin 16/05/19 16:19:37 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin 16/05/19 16:19:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin) 16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriver' on port 43771. 16/05/19 16:19:38 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/05/19 16:19:38 INFO Remoting: Starting remoting 16/05/19 16:19:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.17.0.50:57722] 16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 57722. 16/05/19 16:19:38 INFO spark.SparkEnv: Registering MapOutputTracker 16/05/19 16:19:38 INFO spark.SparkEnv: Registering BlockManagerMaster 16/05/19 16:19:38 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-e8de3854-2526-4725-8c73-edb3fce2df33 16/05/19 16:19:38 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB 16/05/19 16:19:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator 16/05/19 16:19:39 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/05/19 16:19:39 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/05/19 16:19:39 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/05/19 16:19:39 INFO ui.SparkUI: Started SparkUI at http://10.17.0.50:4040 16/05/19 16:19:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/05/19 16:19:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 16/05/19 16:19:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 16/05/19 16:19:39 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/05/19 16:19:39 INFO yarn.Client: Setting up container launch context for our AM 16/05/19 16:19:39 INFO yarn.Client: Setting up the launch environment for our AM container 16/05/19 16:19:39 INFO yarn.Client: Preparing resources for our AM container 16/05/19 16:19:40 INFO yarn.Client: Uploading resource file:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar 16/05/19 16:19:42 INFO yarn.Client: Uploading resource file:/tmp/spark-942afe6a-95ca-4b8b-b06f-e9e3ac6aa751/__spark_conf__5009784131719458516.zip -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/__spark_conf__5009784131719458516.zip 16/05/19 16:19:42 INFO spark.SecurityManager: Changing view acls to: hadoopadmin 16/05/19 16:19:42 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin 16/05/19 16:19:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin) 16/05/19 16:19:42 INFO yarn.Client: Submitting application 2 to ResourceManager 16/05/19 16:19:42 INFO impl.YarnClientImpl: Submitted application application_1463670715317_0002 16/05/19 16:19:43 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED) 16/05/19 16:19:43 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1463671182634 final status: UNDEFINED tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/ user: hadoopadmin 16/05/19 16:19:44 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED) 16/05/19 16:19:45 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED) 16/05/19 16:19:46 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED) 16/05/19 16:19:47 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002 16/05/19 16:19:47 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/05/19 16:19:47 INFO yarn.Client: Application report for application_1463670715317_0002 (state: RUNNING) 16/05/19 16:19:47 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.17.0.50 ApplicationMaster RPC port: 0 queue: default start time: 1463671182634 final status: UNDEFINED tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/ user: hadoopadmin 16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Application application_1463670715317_0002 has started running. 16/05/19 16:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49183. 16/05/19 16:19:47 INFO netty.NettyBlockTransferService: Server created on 49183 16/05/19 16:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/05/19 16:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.17.0.50:49183 with 511.1 MB RAM, BlockManagerId(driver, 10.17.0.50, 49183) 16/05/19 16:19:47 INFO storage.BlockManagerMaster: Registered BlockManager 16/05/19 16:19:51 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/05/19 16:19:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002 16/05/19 16:19:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/05/19 16:19:54 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/05/19 16:19:54 INFO ui.SparkUI: Stopped Spark web UI at http://10.17.0.50:4040 16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Stopped 16/05/19 16:19:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/05/19 16:19:54 INFO storage.MemoryStore: MemoryStore cleared 16/05/19 16:19:54 INFO storage.BlockManager: BlockManager stopped 16/05/19 16:19:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/05/19 16:19:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/05/19 16:19:54 INFO spark.SparkContext: Successfully stopped SparkContext 16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 16/05/19 16:20:09 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 16/05/19 16:20:09 ERROR spark.SparkContext: Error initializing SparkContext. java.lang.NullPointerException at org.apache.spark.SparkContext.<init>(SparkContext.scala:584) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) at $line3.$read$iwC$iwC.<init>(<console>:15) at $line3.$read$iwC.<init>(<console>:24) at $line3.$read.<init>(<console>:26) at $line3.$read$.<init>(<console>:30) at $line3.$read$.<clinit>(<console>) at $line3.$eval$.<init>(<console>:7) at $line3.$eval$.<clinit>(<console>) at $line3.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125) at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124) at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974) at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159) at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108) at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:991) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/05/19 16:20:09 INFO spark.SparkContext: SparkContext already stopped. java.lang.NullPointerException at org.apache.spark.SparkContext.<init>(SparkContext.scala:584) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017) at $iwC$iwC.<init>(<console>:15) at $iwC.<init>(<console>:24) at <init>(<console>:26) at .<init>(<console>:30) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at ... org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) <console>:16: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:16: error: not found: value sqlContext import sqlContext.sql ^

joncodin · ‎05-09-2016

Thank you really, it helped a lot understand better this.

joncodin · ‎05-09-2016

Thanks for your answer, it really helped understand better the logic. I just have one more doubt about your second answer. So the actions are collect or show in this case. But about transformations, select is a transformation? And also if the query its not only a "select * from customers", but have some operations like group by, filter, join operations, this operations will be transformations that spark will aplply on the dataframe during the query execution?

joncodin · ‎05-09-2016

Hi, Im studing the interaction of spark with hive, to execute queries over hive tables with spark sql using hiveContex. But, Im having some doubts to understanding the logic. From the spark documentation, the basic code for this is this: // sc is an existing SparkContext. var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) var query = hiveContext.sql("select * from customers"); query.collect() I have three main doubts. I read that spark works with rdds, and then spark can apply actions or transformations in that rdds. 1) It seems that we can create a rdd by loading an external dataset, so in this above code where the RDD is created? Is here "query = hiveContext.sql("select * from customers");" ? var query is the RDD? 2) And then after the RDD is created we can do transformations and actions, but in this case of execute queries over hive tables we just do actions right? There is no need for transformations right? And the action here is collect() right? 3) And third, I also read that spark computes rdds in a lazy way to save storage space. In this use case of execute queries over hive tables with above code, where or how this lazy evaluation mechanism happens, so the spark can save storage space? Can you give some help to understand this better?

joncodin · ‎05-08-2016

Thanks really. so that is what Im trying to do I guess. And your code in your first answer its working now. But when I execute a query in hive just to test if the data is inside the table lile "select * from partsupp", dont return any results, because it shows this error: "Failed: Execution error, return code 2 from org.apache.hadoop.hive.sql.exec.mr.MapRedTask". Do you have any idea for this?

joncodin · ‎05-08-2016

Thanks for your help again. Im doing this and then I will compare with hive on tez to check the difference. But now I didnt understand well what you said, Im still a beginner in big data, and I read that store the tables in hive is better because then the queries are fastest because orc is a compressed format so the data size is smaller. But you are saying that dont, and we should use orc in hadoop? I have a .tbl file so I should convert that file into orc before store into hadoop?

joncodin · ‎05-08-2016

Thanks for your answer, Im using hive 1.2.1. And I read that parquet and orc formats because they are columnar are fastest. And I want to query this data from spark later. But so, its better store the data in the table as text?

Online	Offline
Last Visited	‎06-22-2016 06:53 PM

Member Since	‎03-20-2016 04:13 PM
Last Visited	‎06-22-2016 06:53 PM
Posts	56
Kudos received	18

Cloudera Community

Spark strange behavior: Im executing a query and i...

Re: Error initializing SparkContext., Containers l...

Re: Error initializing SparkContext., Containers l...

Error initializing SparkContext., Containers logs:...

Re: spark sql interaction with hive doubts

Re: spark sql interaction with hive doubts

spark sql interaction with hive doubts

Re: create hive orc table

Re: create hive orc table

Re: create hive orc table