Member since
03-20-2016
56
Posts
18
Kudos Received
0
Solutions
05-25-2016
09:54 AM
Im executing a query on spark and it is working Im getting the result. I did not configure any cluster so spark should be using its own cluster manager. But in the spark page: master:8080 I get this: Alive Workers: 2
Cores in use: 4 Total, 0 Used
Memory in use: 6.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE But when Im executing the query I get the same result while Im refresinh the page: Alive Workers: 2
Cores in use: 4 Total, 0 Used
Memory in use: 6.0 GB Total, 0.0 B Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
And after the execution of the query this is the same again...Do you know why? Its very strange, it seems that spark is executing the query without using any hardware which is not possible, so why this info is not updating do you know?
... View more
Labels:
- Labels:
-
Apache Spark
05-19-2016
08:02 PM
Hi, Im testing on hadoop 2-7.1, java 1.8 and spark-1.6.1-bin-hadoop2.6.
... View more
05-19-2016
08:02 PM
Thanks for your help. I tried to start spark with command that you said, but I have the exact same error.
... View more
05-19-2016
08:02 PM
When I start the spark-yarn using this command " spark-shell --master yarn-client " Im getting an error saying: <code>ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
The full error I got in starting spark shell with yarn is below, the logs about yarn containers is here: <code>Container: container_1463670715317_0002_01_000001 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:5748
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:44 INFO yarn.ApplicationMaster: Registered signal handlers for [T ERM, HUP, INT]
16/05/19 16:19:45 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_ 1463670715317_0002_000001
16/05/19 16:19:46 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:46 INFO spark.SecurityManager: Changing modify acls to: hadoopadm in
16/05/19 16:19:46 INFO spark.SecurityManager: SecurityManager: authentication di sabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users w ith modify permissions: Set(hadoopadmin)
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Waiting for Spark driver to be re achable.
16/05/19 16:19:46 INFO yarn.ApplicationMaster: Driver now available: 10.17.0.50: 43771
16/05/19 16:19:47 INFO yarn.ApplicationMaster$AMEndpoint: Add WebUI Filter. AddW ebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_ HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/a pplication_1463670715317_0002),/proxy/application_1463670715317_0002)
16/05/19 16:19:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0 :8030
16/05/19 16:19:47 INFO yarn.YarnRMClient: Registering the ApplicationMaster
16/05/19 16:19:47 INFO yarn.YarnAllocator: Will request 2 executor containers, e ach with 1 cores and 1408 MB memory including 384 MB overhead
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.YarnAllocator: Container request (host: Any, capabil ity: <memory:1408, vCores:1>)
16/05/19 16:19:47 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
16/05/19 16:19:47 INFO impl.AMRMClientImpl: Received new token for : masternode:52694
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching container container_1463670 715317_0002_01_000002 for on host masternode
16/05/19 16:19:47 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl : spark://CoarseGrainedScheduler@10.17.0.50:43771, executorHostname: masternode
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Starting Executor Container
16/05/19 16:19:47 INFO yarn.YarnAllocator: Received 1 containers from YARN, laun ching executors on 1 of them.
16/05/19 16:19:47 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-ca ched-nodemanagers-proxies : 0
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Preparing Local resources
16/05/19 16:19:47 INFO yarn.ExecutorRunnable: Prepared Local resources Map(_spa rk_.jar -> resource
{ scheme: "hdfs" host: "localhost" port: 9000 file: "/user/ hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-ha doop2.6.0.jar" }
size: 187698038 timestamp: 1463671182405 type: FILE visibility: PRIVATE)
16/05/19 16:19:48 INFO yarn.ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> PWD<CPS>PWD/_spark_.jar<CPS>$HADOOP_CONF_DIR<CPS>$HAD OOP_COMMON_HOME/share/hadoop/common/<CPS>$HADOOP_COMMON_HOME/share/hadoop/commo n/lib/<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/<CPS>$HADOOP_HDFS_HOME/share/ha doop/hdfs/lib/<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/<CPS>$HADOOP_YARN_HOME/ share/hadoop/yarn/lib/<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/<CPS>$HA DOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/
SPARK_LOG_URL_STDERR -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stderr?start=-4096
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1463670715317_0002
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 187698038
SPARK_USER -> hadoopadmin
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1463671182405
SPARK_LOG_URL_STDOUT -> http://masternode:8042/node/containerlogs/conta iner_1463670715317_0002_01_000002/hadoopadmin/stdout?start=-4096
SPARK_YARN_CACHE_FILES -> hdfs://localhost:9000/user/hadoopadmin/.sparkStagi ng/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar#_spark_ .jar
command:
JAVA_HOME/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -X mx1024m -Djava.io.tmpdir=PWD/tmp '-Dspark.driver.port=43771' -Dspark.yarn.ap p.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBac kend --driver-url spark://CoarseGrainedScheduler@10.17.0.50:43771 --executor-id 1 --hostname masternode --cores 1 --app-id application_1463670715317_0002 - -user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
16/05/19 16:19:48 INFO impl.ContainerManagementProtocolProxy: Opening proxy : masternode:52694
16/05/19 16:19:48 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/05/19 16:19:48 INFO yarn.ApplicationMaster: Final app status: UNDEFINED, exit Code: 0, (reason: Shutdown hook called before final status was reported.)
16/05/19 16:19:48 INFO util.ShutdownHookManager: Shutdown hook called
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
Container: container_1463670715317_0002_02_000002 on masternode_52694
============================================================================
LogType:stderr
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:737
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hadoopadmin/nm-local-dir/usercache /hadoopadmin/filecache/13/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/S taticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.1/share/hadoop/common/li b/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/19 16:19:54 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
16/05/19 16:19:54 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 1 5: SIGTERM
End of LogType:stderr
LogType:stdout
Log Upload Time:Thu May 19 16:19:54 +0100 2016
LogLength:0
Log Contents:
End of LogType:stdout
hadoopadmin@master:~$
The full error that it shows when I try to start spark with " spark-shell --master yarn-client ": <code>hadoopadmin@master:~$ spark-shell --master yarn-client
16/05/19 16:19:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/19 16:19:33 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:33 INFO spark.HttpServer: Starting HTTP Server
16/05/19 16:19:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:33 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:37052
16/05/19 16:19:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 37052.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
16/05/19 16:19:37 INFO spark.SparkContext: Running Spark version 1.6.1
16/05/19 16:19:37 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriver' on port 43771.
16/05/19 16:19:38 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/05/19 16:19:38 INFO Remoting: Starting remoting
16/05/19 16:19:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.17.0.50:57722]
16/05/19 16:19:38 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 57722.
16/05/19 16:19:38 INFO spark.SparkEnv: Registering MapOutputTracker
16/05/19 16:19:38 INFO spark.SparkEnv: Registering BlockManagerMaster
16/05/19 16:19:38 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-e8de3854-2526-4725-8c73-edb3fce2df33
16/05/19 16:19:38 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB
16/05/19 16:19:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/05/19 16:19:39 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/05/19 16:19:39 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
16/05/19 16:19:39 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/05/19 16:19:39 INFO ui.SparkUI: Started SparkUI at http://10.17.0.50:4040
16/05/19 16:19:39 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/19 16:19:39 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/05/19 16:19:39 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/05/19 16:19:39 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/05/19 16:19:39 INFO yarn.Client: Setting up container launch context for our AM
16/05/19 16:19:39 INFO yarn.Client: Setting up the launch environment for our AM container
16/05/19 16:19:39 INFO yarn.Client: Preparing resources for our AM container
16/05/19 16:19:40 INFO yarn.Client: Uploading resource file:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/spark-assembly-1.6.1-hadoop2.6.0.jar
16/05/19 16:19:42 INFO yarn.Client: Uploading resource file:/tmp/spark-942afe6a-95ca-4b8b-b06f-e9e3ac6aa751/__spark_conf__5009784131719458516.zip -> hdfs://localhost:9000/user/hadoopadmin/.sparkStaging/application_1463670715317_0002/__spark_conf__5009784131719458516.zip
16/05/19 16:19:42 INFO spark.SecurityManager: Changing view acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: Changing modify acls to: hadoopadmin
16/05/19 16:19:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoopadmin); users with modify permissions: Set(hadoopadmin)
16/05/19 16:19:42 INFO yarn.Client: Submitting application 2 to ResourceManager
16/05/19 16:19:42 INFO impl.YarnClientImpl: Submitted application application_1463670715317_0002
16/05/19 16:19:43 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:43 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:44 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:45 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:46 INFO yarn.Client: Application report for application_1463670715317_0002 (state: ACCEPTED)
16/05/19 16:19:47 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:47 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:47 INFO yarn.Client: Application report for application_1463670715317_0002 (state: RUNNING)
16/05/19 16:19:47 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.17.0.50
ApplicationMaster RPC port: 0
queue: default
start time: 1463671182634
final status: UNDEFINED
tracking URL: http://masternode:8088/proxy/application_1463670715317_0002/
user: hadoopadmin
16/05/19 16:19:47 INFO cluster.YarnClientSchedulerBackend: Application application_1463670715317_0002 has started running.
16/05/19 16:19:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49183.
16/05/19 16:19:47 INFO netty.NettyBlockTransferService: Server created on 49183
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/05/19 16:19:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.17.0.50:49183 with 511.1 MB RAM, BlockManagerId(driver, 10.17.0.50, 49183)
16/05/19 16:19:47 INFO storage.BlockManagerMaster: Registered BlockManager
16/05/19 16:19:51 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/05/19 16:19:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> masternode, PROXY_URI_BASES -> http://masternode:8088/proxy/application_1463670715317_0002), /proxy/application_1463670715317_0002
16/05/19 16:19:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/05/19 16:19:54 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/05/19 16:19:54 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/05/19 16:19:54 INFO ui.SparkUI: Stopped Spark web UI at http://10.17.0.50:4040
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
16/05/19 16:19:54 INFO cluster.YarnClientSchedulerBackend: Stopped
16/05/19 16:19:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/19 16:19:54 INFO storage.MemoryStore: MemoryStore cleared
16/05/19 16:19:54 INFO storage.BlockManager: BlockManager stopped
16/05/19 16:19:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/05/19 16:19:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/19 16:19:54 INFO spark.SparkContext: Successfully stopped SparkContext
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/05/19 16:19:54 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/05/19 16:20:09 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
16/05/19 16:20:09 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $line3.$read$iwC$iwC.<init>(<console>:15)
at $line3.$read$iwC.<init>(<console>:24)
at $line3.$read.<init>(<console>:26)
at $line3.$read$.<init>(<console>:30)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.<init>(<console>:7)
at $line3.$eval$.<clinit>(<console>)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/05/19 16:20:09 INFO spark.SparkContext: SparkContext already stopped.
java.lang.NullPointerException
at org.apache.spark.SparkContext.<init>(SparkContext.scala:584)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
at $iwC$iwC.<init>(<console>:15)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at ... org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
<console>:16: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:16: error: not found: value sqlContext
import sqlContext.sql
^
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
05-09-2016
12:11 PM
Thanks for your answer, it really helped understand better the logic. I just have one more doubt about your second answer. So the actions are collect or show in this case. But about transformations, select is a transformation? And also if the query its not only a "select * from customers", but have some operations like group by, filter, join operations, this operations will be transformations that spark will aplply on the dataframe during the query execution?
... View more
05-09-2016
12:02 AM
Hi, Im studing the interaction of spark with hive, to execute queries over hive tables with spark sql using hiveContex. But, Im having some doubts to understanding the logic. From the spark documentation, the basic code for this is this: // sc is an existing SparkContext.
var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
var query = hiveContext.sql("select * from customers");
query.collect() I have three main doubts. I read that spark works with rdds, and then spark can apply actions or transformations in that rdds. 1) It seems that we can create a rdd by loading an external dataset, so in this above code where the RDD is created? Is here "query = hiveContext.sql("select * from customers");" ? var query is the RDD? 2) And then after the RDD is created we can do transformations and actions, but in this case of execute queries over hive tables we just do actions right? There is no need for transformations right? And the action here is collect() right? 3) And third, I also read that spark computes rdds in a lazy way to save storage space. In this use case of execute queries over hive tables with above code, where or how this lazy evaluation mechanism happens, so the spark can save storage space? Can you give some help to understand this better?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
05-08-2016
06:47 PM
Thanks really. so that is what Im trying to do I guess. And your code in your first answer its working now. But when I execute a query in hive just to test if the data is inside the table lile "select * from partsupp", dont return any results, because it shows this error: "Failed: Execution error, return code 2 from org.apache.hadoop.hive.sql.exec.mr.MapRedTask". Do you have any idea for this?
... View more
05-08-2016
05:48 PM
Thanks for your help again. Im doing this and then I will compare with hive on tez to check the difference. But now I didnt understand well what you said, Im still a beginner in big data, and I read that store the tables in hive is better because then the queries are fastest because orc is a compressed format so the data size is smaller. But you are saying that dont, and we should use orc in hadoop? I have a .tbl file so I should convert that file into orc before store into hadoop?
... View more
05-08-2016
03:32 PM
Thanks for your answer, Im using hive 1.2.1. And I read that parquet and orc formats because they are columnar are fastest. And I want to query this data from spark later. But so, its better store the data in the table as text?
... View more