Member since
04-11-2016
38
Posts
13
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
36628 | 01-04-2017 11:43 PM | |
2992 | 09-05-2016 04:07 PM | |
7450 | 09-05-2016 03:50 PM | |
1442 | 08-30-2016 08:15 PM | |
2927 | 08-30-2016 01:01 PM |
11-21-2018
08:31 PM
Hey Jasper, great article! Thanks for sharing. Would you recommend using hive to spark. what about a similar article using spark? 😉
... View more
09-01-2018
09:06 PM
@ Steve Matison Thank you for posting the ES MPACK. Would you be able to share how to build our own custom MPACK. Maybe a follow on blog. Cheers Amit
... View more
10-29-2017
08:19 PM
Storm - Supervisor and Nimbus dropping immediately after start - Please advise on remediation. Thanks. HDP install on OpenStack - Centos 7.2 with Ambari 2.5.2.0: HDP-2.6.2.14 - Storm 1.1.0 Storm failure to start at install time with the following log: stderr: /var/lib/ambari-agent/data/errors-238.txt
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1/package/scripts/service_check.py", line 79, in <module>
ServiceCheck().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/STORM/0.9.1/package/scripts/service_check.py", line 70, in service_check
user=params.storm_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'storm jar /tmp/wordCount.jar storm.starter.WordCountTopology WordCountid1aaca8ef_date022917' returned 1. Running: /usr/jdk64/jdk1.8.0_112/bin/java -server -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar org.apache.storm.daemon.ClientJarTransformerRunner org.apache.storm.hack.StormShadeTransformer /tmp/wordCount.jar /tmp/ea59a668bcca11e7ae97fa163eb0f425.jar
1330 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseBasicBolt to org/apache/storm/topology/base/BaseBasicBolt in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1337 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Tuple to org/apache/storm/tuple/Tuple in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1338 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/BasicOutputCollector to org/apache/storm/topology/BasicOutputCollector in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1338 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Values to org/apache/storm/tuple/Values in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1339 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/OutputFieldsDeclarer to org/apache/storm/topology/OutputFieldsDeclarer in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1339 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Fields to org/apache/storm/tuple/Fields in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1340 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/LinearDRPCTopologyBuilder to org/apache/storm/drpc/LinearDRPCTopologyBuilder in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IBasicBolt to org/apache/storm/topology/IBasicBolt in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/LinearDRPCInputDeclarer to org/apache/storm/drpc/LinearDRPCInputDeclarer in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/Config to org/apache/storm/Config in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1343 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/LocalDRPC to org/apache/storm/LocalDRPC in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1343 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/LocalCluster to org/apache/storm/LocalCluster in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1344 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/ILocalDRPC to org/apache/storm/ILocalDRPC in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1344 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/generated/StormTopology to org/apache/storm/generated/StormTopology in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1345 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/StormSubmitter to org/apache/storm/StormSubmitter in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1353 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseRichBolt to org/apache/storm/topology/base/BaseRichBolt in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1354 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/OutputCollector to org/apache/storm/task/OutputCollector in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1354 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/TopologyContext to org/apache/storm/task/TopologyContext in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/TimeCacheMap$ExpiredCallback to org/apache/storm/utils/TimeCacheMap$ExpiredCallback in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/generated/GlobalStreamId to org/apache/storm/generated/GlobalStreamId in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/TimeCacheMap to org/apache/storm/utils/TimeCacheMap in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1374 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/spout/ISpout to org/apache/storm/spout/ISpout in storm/starter/clj/word_count$sentence_spout__$fn$reify__23.class. please modify your code to use the new namespace
1379 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/IBolt to org/apache/storm/task/IBolt in storm/starter/clj/word_count$split_sentence__$fn$reify__42.class. please modify your code to use the new namespace
1454 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/TopologyBuilder to org/apache/storm/topology/TopologyBuilder in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1454 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/TestWordSpout to org/apache/storm/testing/TestWordSpout in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IRichSpout to org/apache/storm/topology/IRichSpout in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/SpoutDeclarer to org/apache/storm/topology/SpoutDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IRichBolt to org/apache/storm/topology/IRichBolt in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/BoltDeclarer to org/apache/storm/topology/BoltDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/InputDeclarer to org/apache/storm/topology/InputDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/Utils to org/apache/storm/utils/Utils in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1458 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/DRPCSpout to org/apache/storm/drpc/DRPCSpout in storm/starter/ManualDRPC.class. please modify your code to use the new namespace
1458 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/ReturnResults to org/apache/storm/drpc/ReturnResults in storm/starter/ManualDRPC.class. please modify your code to use the new namespace
1460 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseBatchBolt to org/apache/storm/topology/base/BaseBatchBolt in storm/starter/ReachTopology$CountAggregator.class. please modify your code to use the new namespace
1460 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/coordination/BatchOutputCollector to org/apache/storm/coordination/BatchOutputCollector in storm/starter/ReachTopology$CountAggregator.class. please modify your code to use the new namespace
1464 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/coordination/IBatchBolt to org/apache/storm/coordination/IBatchBolt in storm/starter/ReachTopology.class. please modify your code to use the new namespace
1467 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/FeederSpout to org/apache/storm/testing/FeederSpout in storm/starter/SingleJoinExample.class. please modify your code to use the new namespace
1469 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseRichSpout to org/apache/storm/topology/base/BaseRichSpout in storm/starter/spout/RandomSentenceSpout.class. please modify your code to use the new namespace
1469 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/spout/SpoutOutputCollector to org/apache/storm/spout/SpoutOutputCollector in storm/starter/spout/RandomSentenceSpout.class. please modify your code to use the new namespace
1470 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/Time to org/apache/storm/utils/Time in storm/starter/tools/NthLastModifiedTimeTracker.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseTransactionalBolt to org/apache/storm/topology/base/BaseTransactionalBolt in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/ICommitter to org/apache/storm/transactional/ICommitter in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/TransactionAttempt to org/apache/storm/transactional/TransactionAttempt in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/MemoryTransactionalSpout to org/apache/storm/testing/MemoryTransactionalSpout in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/TransactionalTopologyBuilder to org/apache/storm/transactional/TransactionalTopologyBuilder in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/partitioned/IPartitionedTransactionalSpout to org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1491 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/BaseFunction to org/apache/storm/trident/operation/BaseFunction in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1491 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/tuple/TridentTuple to org/apache/storm/trident/tuple/TridentTuple in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1492 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/TridentCollector to org/apache/storm/trident/operation/TridentCollector in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1492 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/CombinerAggregator to org/apache/storm/trident/operation/CombinerAggregator in storm/starter/trident/TridentReach$One.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/StateFactory to org/apache/storm/trident/state/StateFactory in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/IMetricsContext to org/apache/storm/task/IMetricsContext in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/State to org/apache/storm/trident/state/State in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1494 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/ReadOnlyState to org/apache/storm/trident/state/ReadOnlyState in storm/starter/trident/TridentReach$StaticSingleKeyMapState.class. please modify your code to use the new namespace
1494 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/map/ReadOnlyMapState to org/apache/storm/trident/state/map/ReadOnlyMapState in storm/starter/trident/TridentReach$StaticSingleKeyMapState.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/TridentTopology to org/apache/storm/trident/TridentTopology in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/TridentState to org/apache/storm/trident/TridentState in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/Stream to org/apache/storm/trident/Stream in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/MapGet to org/apache/storm/trident/operation/builtin/MapGet in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/QueryFunction to org/apache/storm/trident/state/QueryFunction in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/Function to org/apache/storm/trident/operation/Function in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/fluent/GroupedStream to org/apache/storm/trident/fluent/GroupedStream in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1497 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/Sum to org/apache/storm/trident/operation/builtin/Sum in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/MemoryMapState$Factory to org/apache/storm/trident/testing/MemoryMapState$Factory in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/MemoryMapState to org/apache/storm/trident/testing/MemoryMapState in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/FixedBatchSpout to org/apache/storm/trident/testing/FixedBatchSpout in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/spout/IBatchSpout to org/apache/storm/trident/spout/IBatchSpout in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1500 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/Count to org/apache/storm/trident/operation/builtin/Count in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1500 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/FilterNull to org/apache/storm/trident/operation/builtin/FilterNull in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1501 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/Filter to org/apache/storm/trident/operation/Filter in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1503 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/ShellBolt to org/apache/storm/task/ShellBolt in storm/starter/WordCountTopology$SplitSentence.class. please modify your code to use the new namespace
Running: /usr/jdk64/jdk1.8.0_112/bin/java -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar:/tmp/ea59a668bcca11e7ae97fa163eb0f425.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.6.2.14-5/storm/bin -Dstorm.jar=/tmp/ea59a668bcca11e7ae97fa163eb0f425.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} storm.starter.WordCountTopology WordCountid1aaca8ef_date022917
948 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -6085484404615721045:-8414510412923341525
1095 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2001ms (NOT MAX)
3099 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2002ms (NOT MAX)
5103 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2005ms (NOT MAX)
7110 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2013ms (NOT MAX)
9125 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2016ms (NOT MAX)
11144 [main] WARN o.a.s.u.NimbusClient - Ignoring exception while trying to get leader nimbus info from mst2-an05.field.hortonworks.com. will retry with a different seed host.
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:108) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.<init>(ThriftClient.java:69) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.<init>(NimbusClient.java:128) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:84) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:58) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.getListOfKeysFromBlobStore(StormSubmitter.java:598) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.validateConfs(StormSubmitter.java:564) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:210) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:390) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:162) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at storm.starter.WordCountTopology.main(WordCountTopology.java:77) [ea59a668bcca11e7ae97fa163eb0f425.jar:?]
Caused by: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Caused by: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_112]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_112]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_112]
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Exception in thread "main" org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [mst2-an05.field.hortonworks.com]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:112)
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:58)
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268)
at org.apache.storm.StormSubmitter.getListOfKeysFromBlobStore(StormSubmitter.java:598)
at org.apache.storm.StormSubmitter.validateConfs(StormSubmitter.java:564)
at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:210)
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:390)
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:162)
at storm.starter.WordCountTopology.main(WordCountTopology.java:77)
stdout:
2017-10-29 17:02:12,776 - Stack Feature Version Info: Cluster Stack=2.6, Cluster Current Version=None, Command Stack=None, Command Version=2.6.2.14-5 -> 2.6.2.14-5
2017-10-29 17:02:12,815 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-10-29 17:02:12,817 - checked_call['hostid'] {}
2017-10-29 17:02:12,827 - checked_call returned (0, '1aaca8ef')
2017-10-29 17:02:12,827 - File['/tmp/wordCount.jar'] {'owner': 'storm', 'content': StaticFile('wordCount.jar')}
2017-10-29 17:02:12,831 - Writing File['/tmp/wordCount.jar'] because it doesn't exist
2017-10-29 17:02:12,833 - Changing owner for /tmp/wordCount.jar from 0 to storm
2017-10-29 17:02:12,833 - Execute['storm jar /tmp/wordCount.jar storm.starter.WordCountTopology WordCountid1aaca8ef_date022917'] {'logoutput': True, 'path': [u'/usr/hdp/current/storm-client/bin'], 'user': 'storm'}
Running: /usr/jdk64/jdk1.8.0_112/bin/java -server -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar org.apache.storm.daemon.ClientJarTransformerRunner org.apache.storm.hack.StormShadeTransformer /tmp/wordCount.jar /tmp/ea59a668bcca11e7ae97fa163eb0f425.jar
1330 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseBasicBolt to org/apache/storm/topology/base/BaseBasicBolt in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1337 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Tuple to org/apache/storm/tuple/Tuple in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1338 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/BasicOutputCollector to org/apache/storm/topology/BasicOutputCollector in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1338 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Values to org/apache/storm/tuple/Values in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1339 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/OutputFieldsDeclarer to org/apache/storm/topology/OutputFieldsDeclarer in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1339 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/tuple/Fields to org/apache/storm/tuple/Fields in storm/starter/BasicDRPCTopology$ExclaimBolt.class. please modify your code to use the new namespace
1340 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/LinearDRPCTopologyBuilder to org/apache/storm/drpc/LinearDRPCTopologyBuilder in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IBasicBolt to org/apache/storm/topology/IBasicBolt in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/LinearDRPCInputDeclarer to org/apache/storm/drpc/LinearDRPCInputDeclarer in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1341 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/Config to org/apache/storm/Config in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1343 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/LocalDRPC to org/apache/storm/LocalDRPC in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1343 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/LocalCluster to org/apache/storm/LocalCluster in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1344 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/ILocalDRPC to org/apache/storm/ILocalDRPC in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1344 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/generated/StormTopology to org/apache/storm/generated/StormTopology in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1345 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/StormSubmitter to org/apache/storm/StormSubmitter in storm/starter/BasicDRPCTopology.class. please modify your code to use the new namespace
1353 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseRichBolt to org/apache/storm/topology/base/BaseRichBolt in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1354 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/OutputCollector to org/apache/storm/task/OutputCollector in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1354 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/TopologyContext to org/apache/storm/task/TopologyContext in storm/starter/bolt/RollingCountBolt.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/TimeCacheMap$ExpiredCallback to org/apache/storm/utils/TimeCacheMap$ExpiredCallback in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/generated/GlobalStreamId to org/apache/storm/generated/GlobalStreamId in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1358 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/TimeCacheMap to org/apache/storm/utils/TimeCacheMap in storm/starter/bolt/SingleJoinBolt$ExpireCallback.class. please modify your code to use the new namespace
1374 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/spout/ISpout to org/apache/storm/spout/ISpout in storm/starter/clj/word_count$sentence_spout__$fn$reify__23.class. please modify your code to use the new namespace
1379 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/IBolt to org/apache/storm/task/IBolt in storm/starter/clj/word_count$split_sentence__$fn$reify__42.class. please modify your code to use the new namespace
1454 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/TopologyBuilder to org/apache/storm/topology/TopologyBuilder in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1454 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/TestWordSpout to org/apache/storm/testing/TestWordSpout in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IRichSpout to org/apache/storm/topology/IRichSpout in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/SpoutDeclarer to org/apache/storm/topology/SpoutDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1455 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/IRichBolt to org/apache/storm/topology/IRichBolt in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/BoltDeclarer to org/apache/storm/topology/BoltDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/InputDeclarer to org/apache/storm/topology/InputDeclarer in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1456 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/Utils to org/apache/storm/utils/Utils in storm/starter/ExclamationTopology.class. please modify your code to use the new namespace
1458 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/DRPCSpout to org/apache/storm/drpc/DRPCSpout in storm/starter/ManualDRPC.class. please modify your code to use the new namespace
1458 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/drpc/ReturnResults to org/apache/storm/drpc/ReturnResults in storm/starter/ManualDRPC.class. please modify your code to use the new namespace
1460 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseBatchBolt to org/apache/storm/topology/base/BaseBatchBolt in storm/starter/ReachTopology$CountAggregator.class. please modify your code to use the new namespace
1460 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/coordination/BatchOutputCollector to org/apache/storm/coordination/BatchOutputCollector in storm/starter/ReachTopology$CountAggregator.class. please modify your code to use the new namespace
1464 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/coordination/IBatchBolt to org/apache/storm/coordination/IBatchBolt in storm/starter/ReachTopology.class. please modify your code to use the new namespace
1467 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/FeederSpout to org/apache/storm/testing/FeederSpout in storm/starter/SingleJoinExample.class. please modify your code to use the new namespace
1469 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseRichSpout to org/apache/storm/topology/base/BaseRichSpout in storm/starter/spout/RandomSentenceSpout.class. please modify your code to use the new namespace
1469 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/spout/SpoutOutputCollector to org/apache/storm/spout/SpoutOutputCollector in storm/starter/spout/RandomSentenceSpout.class. please modify your code to use the new namespace
1470 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/utils/Time to org/apache/storm/utils/Time in storm/starter/tools/NthLastModifiedTimeTracker.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/topology/base/BaseTransactionalBolt to org/apache/storm/topology/base/BaseTransactionalBolt in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/ICommitter to org/apache/storm/transactional/ICommitter in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1480 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/TransactionAttempt to org/apache/storm/transactional/TransactionAttempt in storm/starter/TransactionalGlobalCount$UpdateGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/testing/MemoryTransactionalSpout to org/apache/storm/testing/MemoryTransactionalSpout in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/TransactionalTopologyBuilder to org/apache/storm/transactional/TransactionalTopologyBuilder in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1482 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/transactional/partitioned/IPartitionedTransactionalSpout to org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout in storm/starter/TransactionalGlobalCount.class. please modify your code to use the new namespace
1491 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/BaseFunction to org/apache/storm/trident/operation/BaseFunction in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1491 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/tuple/TridentTuple to org/apache/storm/trident/tuple/TridentTuple in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1492 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/TridentCollector to org/apache/storm/trident/operation/TridentCollector in storm/starter/trident/TridentReach$ExpandList.class. please modify your code to use the new namespace
1492 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/CombinerAggregator to org/apache/storm/trident/operation/CombinerAggregator in storm/starter/trident/TridentReach$One.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/StateFactory to org/apache/storm/trident/state/StateFactory in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/IMetricsContext to org/apache/storm/task/IMetricsContext in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1493 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/State to org/apache/storm/trident/state/State in storm/starter/trident/TridentReach$StaticSingleKeyMapState$Factory.class. please modify your code to use the new namespace
1494 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/ReadOnlyState to org/apache/storm/trident/state/ReadOnlyState in storm/starter/trident/TridentReach$StaticSingleKeyMapState.class. please modify your code to use the new namespace
1494 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/map/ReadOnlyMapState to org/apache/storm/trident/state/map/ReadOnlyMapState in storm/starter/trident/TridentReach$StaticSingleKeyMapState.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/TridentTopology to org/apache/storm/trident/TridentTopology in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/TridentState to org/apache/storm/trident/TridentState in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/Stream to org/apache/storm/trident/Stream in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1495 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/MapGet to org/apache/storm/trident/operation/builtin/MapGet in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/state/QueryFunction to org/apache/storm/trident/state/QueryFunction in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/Function to org/apache/storm/trident/operation/Function in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1496 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/fluent/GroupedStream to org/apache/storm/trident/fluent/GroupedStream in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1497 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/Sum to org/apache/storm/trident/operation/builtin/Sum in storm/starter/trident/TridentReach.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/MemoryMapState$Factory to org/apache/storm/trident/testing/MemoryMapState$Factory in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/MemoryMapState to org/apache/storm/trident/testing/MemoryMapState in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/testing/FixedBatchSpout to org/apache/storm/trident/testing/FixedBatchSpout in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1499 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/spout/IBatchSpout to org/apache/storm/trident/spout/IBatchSpout in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1500 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/Count to org/apache/storm/trident/operation/builtin/Count in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1500 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/builtin/FilterNull to org/apache/storm/trident/operation/builtin/FilterNull in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1501 [main] WARN o.a.s.h.DefaultShader - Relocating storm/trident/operation/Filter to org/apache/storm/trident/operation/Filter in storm/starter/trident/TridentWordCount.class. please modify your code to use the new namespace
1503 [main] WARN o.a.s.h.DefaultShader - Relocating backtype/storm/task/ShellBolt to org/apache/storm/task/ShellBolt in storm/starter/WordCountTopology$SplitSentence.class. please modify your code to use the new namespace
Running: /usr/jdk64/jdk1.8.0_112/bin/java -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar:/tmp/ea59a668bcca11e7ae97fa163eb0f425.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.6.2.14-5/storm/bin -Dstorm.jar=/tmp/ea59a668bcca11e7ae97fa163eb0f425.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} storm.starter.WordCountTopology WordCountid1aaca8ef_date022917
948 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -6085484404615721045:-8414510412923341525
1095 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2001ms (NOT MAX)
3099 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2002ms (NOT MAX)
5103 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2005ms (NOT MAX)
7110 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2013ms (NOT MAX)
9125 [main] WARN o.a.s.u.StormBoundedExponentialBackoffRetry - WILL SLEEP FOR 2016ms (NOT MAX)
11144 [main] WARN o.a.s.u.NimbusClient - Ignoring exception while trying to get leader nimbus info from mst2-an05.field.hortonworks.com. will retry with a different seed host.
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:108) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.<init>(ThriftClient.java:69) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.<init>(NimbusClient.java:128) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:84) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:58) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.getListOfKeysFromBlobStore(StormSubmitter.java:598) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.validateConfs(StormSubmitter.java:564) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:210) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:390) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:162) [storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at storm.starter.WordCountTopology.main(WordCountTopology.java:77) [ea59a668bcca11e7ae97fa163eb0f425.jar:?]
Caused by: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Caused by: org.apache.storm.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_112]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_112]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_112]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_112]
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.0.2.6.2.14-5.jar:1.1.0.2.6.2.14-5]
... 11 more
Exception in thread "main" org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [mst2-an05.field.hortonworks.com]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:112)
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:58)
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268)
at org.apache.storm.StormSubmitter.getListOfKeysFromBlobStore(StormSubmitter.java:598)
at org.apache.storm.StormSubmitter.validateConfs(StormSubmitter.java:564)
at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:210)
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:390)
at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:162)
at storm.starter.WordCountTopology.main(WordCountTopology.java:77)
Command failed after 1 tries
Tried stop all storm services and restart Error opening zip file or JAR manifest missing : /usr/hdp/current/storm-supervisor/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar lrwxrwxrwx. 1 storm storm 14 Oct 29 16:44 /usr/hdp/current/storm-nimbus/logs -> /var/log/storm
[root@mst2-an05 ~]# cd /var/log/storm
[root@mst2-an05 storm]# ll
total 2012
-rw-r--r--. 1 storm hadoop 0 Oct 29 16:59 access-drpc.log
-rw-r--r--. 1 storm hadoop 0 Oct 29 16:59 access-logviewer.log
-rw-r--r--. 1 storm hadoop 0 Oct 29 16:59 access-ui.log
-rw-r--r--. 1 storm hadoop 0 Oct 29 16:59 access-web-drpc.log
-rw-r--r--. 1 storm hadoop 0 Oct 29 16:59 access-web-logviewer.log
-rw-r--r--. 1 storm hadoop 93398 Oct 29 19:58 access-web-ui.log
-rw-r--r--. 1 storm hadoop 2505 Oct 29 19:51 drpc.log
-rw-r--r--. 1 storm hadoop 1836 Oct 29 19:51 drpc.out
-rw-r--r--. 1 storm hadoop 2280 Oct 29 19:51 logviewer.log
-rw-r--r--. 1 storm hadoop 1881 Oct 29 19:51 logviewer.out
-rw-r--r--. 1 storm hadoop 2300 Oct 29 19:52 nimbus.out
-rw-r--r--. 1 storm hadoop 2506 Oct 29 19:51 supervisor.out
-rw-r--r--. 1 storm hadoop 1925972 Oct 29 19:52 ui.log
-rw-r--r--. 1 storm hadoop 1854 Oct 29 19:51 ui.out
[root@mst2-an05 storm]# cat nimbus.out
Running: /usr/jdk64/jdk1.8.0_112/bin/java -server -Ddaemon.name=nimbus -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ojdbc6.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ranger-plugin-classloader-0.7.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ranger-storm-plugin-shim-0.7.0.2.6.2.14-5.jar:/usr/hdp/current/storm-nimbus/conf -Xmx1024m -javaagent:/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8649,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/conf/jmxetric-conf.xml,process=Nimbus_JVM -Dlogfile.name=nimbus.log -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/usr/hdp/2.6.2.14-5/storm/log4j2/cluster.xml org.apache.storm.daemon.nimbus
Error opening zip file or JAR manifest missing : /usr/hdp/current/storm-nimbus/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar
Error occurred during initialization of VM
agent library failed to init: instrument
[root@mst2-an05 storm]# cat supervisor.out
Running: /usr/jdk64/jdk1.8.0_112/bin/java -server -Ddaemon.name=supervisor -Dstorm.options= -Dstorm.home=/usr/hdp/2.6.2.14-5/storm -Dstorm.log.dir=/var/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.6.2.14-5/storm/lib/asm-5.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/clojure-1.7.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/disruptor-3.3.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/kryo-3.0.3.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-api-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-core-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.6.2.14-5/storm/lib/log4j-slf4j-impl-2.8.2.jar:/usr/hdp/2.6.2.14-5/storm/lib/minlog-1.3.0.jar:/usr/hdp/2.6.2.14-5/storm/lib/objenesis-2.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/reflectasm-1.10.1.jar:/usr/hdp/2.6.2.14-5/storm/lib/ring-cors-0.1.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.6.2.14-5/storm/lib/slf4j-api-1.7.21.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-core-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/storm-rename-hack-1.1.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/lib/zookeeper.jar:/usr/hdp/2.6.2.14-5/storm/lib/ambari-metrics-storm-sink.jar:/usr/hdp/2.6.2.14-5/storm/extlib/atlas-plugin-classloader-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib/storm-bridge-shim-0.8.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ojdbc6.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ranger-plugin-classloader-0.7.0.2.6.2.14-5.jar:/usr/hdp/2.6.2.14-5/storm/extlib-daemon/ranger-storm-plugin-shim-0.7.0.2.6.2.14-5.jar:/usr/hdp/current/storm-supervisor/conf -Xmx256m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=56431 -javaagent:/usr/hdp/current/storm-supervisor/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar=host=localhost,port=8650,wireformat31x=true,mode=multicast,config=/usr/hdp/current/storm-supervisor/contrib/storm-jmxetric/conf/jmxetric-conf.xml,process=Supervisor_JVM -Dlogfile.name=supervisor.log -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/usr/hdp/2.6.2.14-5/storm/log4j2/cluster.xml org.apache.storm.daemon.supervisor.Supervisor
Error opening zip file or JAR manifest missing : /usr/hdp/current/storm-supervisor/contrib/storm-jmxetric/lib/jmxetric-1.0.4.jar
Error occurred during initialization of VM
agent library failed to init: instrument
... View more
Labels:
- Labels:
-
Apache Storm
09-27-2017
01:31 PM
1 Kudo
A Machine Learning Model learns from data. As you get new incremental data, the Machine Learning model needs to be upgraded. A Machine Learning Model factory ensures that as you have deployed model in production, continuous learning is also happening on incremental new data ingested in the Production environment. As deployed ML Model's performance decays, a new trained and serialized model needs to be deployed. An A/B test between the deployed model and the newly trained model can score them to evaluate the performance of the deployed model versus the incrementally trained model.
In order to build a Machine Learning Model factory, we have to establish a robust road to production, first. The foundational framework is first to establish three environments: DEV, TEST and PROD. 1- DEV - A development environment where the Data Scientists have their own data puddle in order to perform data exploration, profile the data, develop the machine learning features from the data, build the model, train and test it on the limited subset and then commit to git to transport the code to the next stages. For the purpose of scaling and tuning the learning of the Machine Learning model, we establish a DEV Validation environment, where the model learning is scaled with as much historical data as possible and tuned. 2- TEST - The TEST environment is a pre-production environment where we running the machine learning models through integration tests and readying the move of the Machine Learning model to production in two branches: 2a - model deployment: where the trained serialized Machine Learning model is deployed in the production environment 2b - continuous training: where the Machine Learning model is going through continuous training on incremental data 3- PROD - The Production environment is where live data is ingested. In the production environment a deployment server is hosting the serialized trained model. The deployed model exposes a REST api to deliver predictions on live data queries.
The ML model code is running in production ingesting incremental live data and getting continuously trained.
The deployed model and the continuous training model performances are measured. If the deployed model is showing decay in prediction performance, then it is switched with a newer serialized version of the continuous training model.
The model performance measure can be tracked by closing the loop with the users feedback and tracking True Positive, False Positive, True Negative and False Negative. This choreography of training and deploying machine learning models in production is the heart of the ML model factory. The road to production is depicting the journey of building Machine Learning models within the DEV/TEST/PROD environments.
... View more
- Find more articles tagged with:
- FAQ
- How-ToTutorial
- Learning
- machine-learning
- model
- production
- Sandbox & Learning
- Spark
- spark-mllib
Labels:
06-28-2017
08:58 PM
3 Kudos
Setting Up a Data Science Platform on HDP using Anaconda Building a Data Science Platform using Anaconda needs to be
able to
Launch PySpark jobs on the cluster Synchronize python libraries from vetted public
repositories Isolate environments with specific dependencies
to run production jobs using an older version of a package whilst simultaneously
running new version of the package Launching notebooks and PySpark jobs using
different kernels such as Python_2.7, Python_3.x, R, Scala Framework of the Data Science Platform
Private Repo Server Edge Nodes Dev Test Prod Ansible Git Jenkins Building blocks of the Data Science Platform
Anaconda Ansible Git Jenkins
... View more
- Find more articles tagged with:
- anaconda
- data-science
- FAQ
- Hadoop Core
- hdp-2.3.4
- python
04-03-2017
07:09 PM
Thanks for the comment Michael. I wrote these commands for hdp environments using standard python 2.7 where we can not do a pip install of snakebite. (i.e. hdp clusters are behind the firewall in secure zone with no pip download allowed)
... View more
03-31-2017
07:42 PM
2 Kudos
Interacting with Hadoop HDFS using Python codes This post will go through the following:
Introducing python “subprocess” module Running HDFS commands with Python Examples of HDFS commands from Python 1-Introducing python “subprocess” module The Python “subprocess” module allows us to:
spawn new Unix processes connect to their input/output/error pipes obtain their return codes To run UNIX commands we need to create a subprocess that runs the command. The recommended approach to invoking subprocesses is to use the convenience functions for all use cases they can handle. Or we can use the underlying Popen interface can be used directly. 2-Running HDFS commands with Python We will create a Python function called run_cmd that will effectively allow us to run any unix or linux commands or in our case hdfs dfs commands as linux pipe capturing stdout and stderr and piping the input as list of arguments of the elements of the native unix or HDFS command. It is passed as a Python list rather than a string of characters as you don't have to parse or escape characters. # import the python subprocess module
import subprocess
def run_cmd(args_list):
"""
run linux commands
"""
# import subprocess
print('Running system command: {0}'.format(' '.join(args_list)))
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
s_output, s_err = proc.communicate()
s_return = proc.returncode
return s_return, s_output, s_err
3-Examples of HDFS commands from Python Run Hadoop ls command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-ls', 'hdfs_file_path'])
lines = out.split('\n')
Run Hadoop get command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-get', 'hdfs_file_path', 'local_path'])
Run Hadoop put command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-put', 'local_file', 'hdfs_file_path'])
Run Hadoop copyFromLocal command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-copyFromLocal', 'local_file', 'hdfs_file_path'])
Run Hadoop copyToLocal command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-copyToLocal', 'hdfs_file_path', 'local_file'])
hdfs dfs -rm -skipTrash /path/to/file/you/want/to/remove/permanently
Run Hadoop remove file command in Python
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-rm', 'hdfs_file_path'])
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-rm', '-skipTrash', 'hdfs_file_path'])
rm -r
HDFS Command to remove the entire directory and all of its content from HDFS.
Usage: hdfs dfs -rm -r <path>
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-rm', '-r', 'hdfs_file_path'])
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-rm', '-r', '-skipTrash', 'hdfs_file_path'])
Check if a file exist in HDFS
Usage: hadoop fs -test -[defsz] URI
Options:
-d: f the path is a directory, return 0.
-e: if the path exists, return 0.
-f: if the path is a file, return 0.
-s: if the path is not empty, return 0.
-z: if the file is zero length, return 0.
Example:
hadoop fs -test -e filename
hdfs_file_path = '/tmpo'
cmd = ['hdfs', 'dfs', '-test', '-e', hdfs_file_path]
ret, out, err = run_cmd(cmd)
print(ret, out, err)
if ret:
print('file does not exist')
These simple but very powerful lines of code allow to interact with HDFS in a programmatic way and can be easily scheduled as part of schedule cron jobs.
... View more
- Find more articles tagged with:
- code
- coding
- Data Processing
- HDFS
- How-ToTutorial
- Linux
- python
Labels:
01-04-2017
11:43 PM
@SBandaru If you are using spark with hdp, then you have to do following things.
Add these entries in your $SPARK_HOME/conf/spark-defaults.conf spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version) spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version) create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like -Dhdp.version=2.2.0.0-2041 (your installed HDP version) to know hdp version please run command hdp-select status hadoop-client in the cluster
... View more
12-31-2016
10:45 PM
1 Kudo
Installing and Exploring Spark 2.0 with Jupyter Notebook and Anaconda Python in your laptop
1-Objective
2-Installing Anaconda Python
3-Checking Python Install
4-Installing Spark
5-Checking Spark Install
6-Launching Jupyter Notebook with PySpark 2.0.2
7-Exploring PySpark 2.0.2
a.Spark Session
b.Read CSV
i.Spark 2.0 and Spark 1.6
ii.Pandas
c.Pandas DataFrames, Spark DataSets, DataFrames and RDDs
d.Machine Learning Pipeline
i.SciKit Learn
ii.Spark MLLib, ML
8-Conclusion
1-Objective
It is often useful to have python with the Jupyter notebook installed on your laptop in order to quickly develop and test some code ideas or to explore some data. Adding the ability to combine Apache Spark to this will also allow you to prototype ideas and exploratory data pipelines before hitting a Hadoop cluster and paying for Amazon Web Services.
We leverage the power of the Python ecosystem with libraries such as Numpy (scientific computing library of high-level mathematical functions to operate on arrays and matrices), SciPy (SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation), Pandas (high performance data structure and data analysis library to build complex data transformation flows), Scikit-Learn (library that implements a range of machine learning, preprocessing, cross-validation and visualization algorithms), NLTK (Natural Language Tool Kit to process text data, libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries)…
We also leverage the strengths of Spark including Spark-SQL, Spark-MLLib or ML.
2-Installing Anaconda Python
We install Continuum’s Anaconda distribution by downloading the install script from the Continuum website. https://www.continuum.io/downloads
The advantage of the Anaconda distribution is that lot of the essential python packages comes in bundled.
You do not have to struggle with all the dependencies synchronization.
We will use the following commands to download the install script. The command is to install Python version 3.5
HW12256:~ usr000$ wget http://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
If you wish to install Python 2.7, the following download is recommended.
HW12256:~ usr000$ wget http://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh
Accordingly, in the terminal, issue the following bash command to launch the install.
Python 3.5 version
HW12256:~ usr000$ bash Anaconda3-4.2.0-Linux-x86_64.sh
Python 2.7 version
HW12256:~ usr000$ bash Anaconda2-4.2.0-Linux-x86_64.sh
In the following steps, we are using Python 3.5 as the base environment.
3-Checking Python Install
In order to check the Python install, we issue the following commands in the terminal.
HW12256:~ usr000$ which python
/Users/usr000/anaconda/bin/python
HW12256:~ usr000$ echo $PATH
/Users/usr000/anaconda/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
HW12256:~ usr000$ python --version
Python 3.5.2 :: Anaconda 4.1.1 (x86_64)
HW12256:~ usr000$ python
Python 3.5.2 |Anaconda 4.1.1 (x86_64)|
(default, Jul 2 2016, 17:52:12)[GCC 4.2.1 Compatible Apple LLVM 4.2
(clang-425.0.28)] on darwinType "help",
"copyright", "credits" or "license" for more
information.
>>> import sys
>>> print("Python version: {}
".format(sys.version))Python version: 3.5.2 |Anaconda 4.1.1
(x86_64)| (default, Jul 2 2016,
17:52:12)[GCC 4.2.1 Compatible Apple LLVM 4.2
(clang-425.0.28)]
>>> from datetime import datetime
>>> print('current date and time:
{}'.format(datetime.now()))current date and time: 2016-12-29
09:46:32.393985
>>> print('current date and time:{}'.format(datetime.now().strftime('%Y-%m-%d
%H:%M:%S')))current date and time: 2016-12-29 09:51:33
>>> exit()
Anaconda Python includes a package manager called ‘conda’ which can list and update the existing libraries available in the current system.
HW12256:~ usr000$ conda info
Current conda install:
platform : osx-64
conda version : 4.2.12
conda is private : False
conda-env version : 4.2.12
conda-build version : 0+unknown
python version : 3.5.2.final.0
requests version : 2.10.0
root environment : /Users/usr000/anaconda (writable)
default environment : /Users/usr000/anaconda
envs directories : /Users/usr000/anaconda/envs
package cache : /Users/usr000/anaconda/pkgs
channel URLs : https://repo.continuum.io/pkgs/free/osx-64
https://repo.continuum.io/pkgs/free/noarch
https://repo.continuum.io/pkgs/pro/osx-64
https://repo.continuum.io/pkgs/pro/noarch
config file : None
offline mode : False
HW12256:~ usr000$ conda list
4-Installing Spark
To install Spark, we download the pre-built spark tarball spark-2.0.2-bin-hadoop2.7.tgz from http://spark.apache.org/downloads.html and move to your target Spark directory.
Untar the tarball in your chosen directory
HW12256:bin usr000$ tar -xvfz spark-2.0.2-bin-hadoop2.7.tgz
Create symlink to spark2 directory
HW12256:bin usr000$ ln -s ~/bin/sparks/spark-2.0.2-bin-hadoop2.7 ~/bin/spark2
5-Checking Spark Install
Check the directories created under Spark 2
HW12256:bin usr000$ ls -lru
total 16drwxr-xr-x
5 usr000 staff 170 Dec 28 10:39 sparkslrwxr-xr-x
1 usr000 staff 50 Dec 28 10:39 spark2 -> /Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7lrwxr-xr-x
1 usr000 staff 51 May 23
2016 spark -> /Users/usr000/bin/sparks/spark-1.6.1-bin-hadoop2.6/HW12256:bin usr000$ cd spark2HW12256:spark2 usr000$ ls -lrutotal 112drwxr-xr-x@ 3 usr000
staff 102 Jan 1 1970
yarndrwxr-xr-x@
24 usr000 staff 816 Jan
1 1970 sbindrwxr-xr-x@
10 usr000 staff 340 Dec 28 10:30 pythondrwxr-xr-x@
38 usr000 staff 1292 Jan
1 1970 licensesdrwxr-xr-x@ 208 usr000 staff
7072 Dec 28 10:30 jarsdrwxr-xr-x@ 4 usr000
staff 136 Jan 1 1970
examplesdrwxr-xr-x@ 5 usr000
staff 170 Jan 1 1970
datadrwxr-xr-x@ 9 usr000
staff 306 Dec 28 10:27 confdrwxr-xr-x@
24 usr000 staff 816 Dec 28 10:30 bin-rw-r--r--@ 1 usr000
staff 120 Dec 28 10:25 RELEASE-rw-r--r--@ 1 usr000
staff 3828 Dec 28 10:25
README.mddrwxr-xr-x@ 3 usr000
staff 102 Jan 1 1970
R-rw-r--r--@ 1 usr000
staff 24749 Dec 28 10:25 NOTICE-rw-r--r--@ 1 usr000
staff 17811 Dec 28 10:25 LICENSEHW12256:spark2 usr000$
Running SparkPi example in local mode.
Scala command
# export SPARK_HOMEHW12256:spark2 usr000$ export
SPARK_HOME=/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7HW12256:spark2 usr000$ echo $SPARK_HOME/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7# Run Spark PI example in ScalaHW12256:spark2 usr000$ ./bin/spark-submit
--class org.apache.spark.examples.SparkPi --driver-memory 512m
--executor-memory 512m --executor-cores 1
$SPARK_HOME/examples/jars/spark-examples*.jar 5Python commandHW12256:spark2 usr000$ ./bin/spark-submit --driver-memory
512m --executor-memory 512m --executor-cores 1 examples/src/main/python/pi.py
10Scala exampleHW12256:spark2 usr000$ export
SPARK_HOME=/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7HW12256:spark2 usr000$ echo $SPARK_HOME/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7HW12256:spark2 usr000$ ./bin/spark-submit
--class org.apache.spark.examples.SparkPi --driver-memory 512m
--executor-memory 512m --executor-cores 1
$SPARK_HOME/examples/jars/spark-examples*.jar 5Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties16/12/29 11:40:53 INFO SparkContext:
Running Spark version 2.0.216/12/29 11:40:53 WARN NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable...16/12/29 11:40:55 INFO DAGScheduler: Job 0
finished: reduce at SparkPi.scala:38, took 0.851288 sPi is roughly 3.139094278188556216/12/29 11:40:55 INFO SparkUI: Stopped
Spark web UI at http://000.000.0.0:4040 ...16/12/29 11:40:55 INFO SparkContext:
Successfully stopped SparkContext16/12/29 11:40:55 INFO ShutdownHookManager:
Shutdown hook called16/12/29 11:40:55 INFO ShutdownHookManager:
Deleting directory
/private/var/folders/1r/8qylt4bj4h59b3h_1xq_nsw00000gp/T/spark-35b67f21-1d52-4dee-9c75-7e9d9c153adaHW12256:spark2 usr000$
Python example
HW12256:spark2 usr000$ ./bin/spark-submit examples/src/main/python/pi.py 10
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties16/12/29 11:27:33 INFO SparkContext:
Running Spark version 2.0.216/12/29 11:27:33 WARN NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java
classes where applicable...16/12/29 11:27:36 INFO TaskSchedulerImpl:
Removed TaskSet 0.0, whose tasks have all completed, from pool16/12/29 11:27:36 INFO DAGScheduler: Job 0
finished: reduce at /Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/examples/src/main/python/pi.py:43,
took 1.199257 sPi is roughly 3.13836016/12/29 11:27:36 INFO SparkUI: Stopped
Spark web UI at http:// http://000.000.0.0 :404016/12/29 11:27:36 INFO
MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!...16/12/29 11:27:36 INFO SparkContext:
Successfully stopped SparkContext16/12/29 11:27:37 INFO ShutdownHookManager:
Shutdown hook called16/12/29 11:27:37 INFO ShutdownHookManager:
Deleting directory
/private/var/folders/1r/8qylt4bj4h59b3h_1xq_nsw00000gp/T/spark-eb12faa9-b7ff-4556-9538-45ddcdc6797b16/12/29 11:27:37 INFO ShutdownHookManager:
Deleting directory /private/var/folders/1r/8qylt4bj4h59b3h_1xq_nsw00000gp/T/spark-eb12faa9-b7ff-4556-9538-45ddcdc6797b/pyspark-ba9947c5-dbea-4edc-9c4c-c2c316e6caba
Wordcount program using PySpark
HW12256:spark2 usr000$ ./bin/pyspark
Python 2.7.10 (default, Jul 30 2016,
19:40:32)[GCC 4.2.1 Compatible Apple LLVM 8.0.0
(clang-800.0.34)] on darwinType "help",
"copyright", "credits" or "license" for more
information.Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.propertiesSetting default log level to
"WARN".To adjust logging level use
sc.setLogLevel(newLevel).16/12/29 12:25:15 WARN NativeCodeLoader:
Unable to load native-hadoop library for your platform... using builtin-java
classes where applicableWelcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version
2.0.2
/_/Using Python version 2.7.10 (default, Jul
30 2016 19:40:32)SparkSession available as 'spark'.>>> import os>>> print(os.getcwd())/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7>>> import re>>> from operator import add>>> wordcounts_in =
sc.textFile('README.md').flatMap(lambda l: re.split('\W+', l.strip())).filter(lambda
w: len(w)>0).map(lambda w: (w,1)).reduceByKey(add).map(lambda (a,b):
(b,a)).sortByKey(ascending = False)/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shuffle.py:58:
UserWarning: Please install psutil to have better support with spilling/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shuffle.py:58:
UserWarning: Please install psutil to have better support with spilling>>> wordcounts_in.take(10)[(23, u'the'), (18, u'Spark'), (14, u'to'),
(13, u'run'), (11, u'for'), (11, u'apache'), (11, u'spark'), (11, u'and'), (11,
u'org'), (8, u'a')]>>> wordcounts_in =
sc.textFile('README.md').flatMap(lambda l: re.split('\W+',
l.strip())).filter(lambda w: len(w)>0).map(lambda w: (w,1)).reduceByKey(add).map(lambda
(a,b): (b,a)).sortByKey(ascending = False).map(lambda (a,b): (b,a))/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shuffle.py:58:
UserWarning: Please install psutil to have better support with spilling/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shuffle.py:58:
UserWarning: Please install psutil to have better support with spilling>>> wordcounts_in.take(10)
[(u'the',
23), (u'Spark', 18), (u'to', 14), (u'run', 13), (u'for', 11), (u'apache', 11),
(u'spark', 11), (u'and', 11), (u'org', 11), (u'a', 8)]>>>exit()
6-Launching Jupyter Notebook with PySpark
Launching Jupyter Notebook with Spark 1.6.*, we use to associate the --packages com.databricks:spark-csv_2.11:1.4.0 parameter in the command as the csv package was not natively part of Spark.
HW12256:~ usr000$ PYSPARK_DRIVER_PYTHON=jupyter
PYSPARK_DRIVER_PYTHON_OPTS='notebook' PYSPARK_PYTHON=python3 /Users/usr000/bin/spark/bin/pyspark
--packages com.databricks:spark-csv_2.11:1.4.0
In the case of Spark 2.0.*, we do not need to associate the spark-csv –packages parameter, as spark-csv is part of the standard Spark 2.0 library.
HW12256:~ usr000$
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS='notebook'
PYSPARK_PYTHON=python3 /Users/usr000/bin/spark2/bin/pyspark
7-Exploring PySpark 2.0.2
We will explore the new features of Spark 2.0.2 using PySpark and contrasting where appropriate with previous version of spark and with pandas. In the case of the machine learning pipeline, we will contract Spark MLLib or ML with Scikit Learn.
a.Spark Session
Spark 2.0 introduces SparkSession. SparkSession is the single entry point for interacting with Spark functionality. It replaces and encapsulates the SQLContext, HiveContext and StreamingContext for a more unified access to the DataFrame and Dataset APIs. The SQLContext, HiveContext and StreamingContext still exist under the hood in Spark 2.0 for continuity purpose with the Spark legacy code.
The Spark session has to be created when using spark-submit command. An example on how to do that:
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf
# from pyspark.sql import SQLContext
spark = SparkSession\ .builder\
.appName("example-spark")\
.config("spark.sql.crossJoin.enabled","true")\
.getOrCreate()sc = SparkContext()
# sqlContext = SQLContext(sc)
When typing ‘pyspark’ at the terminal, python automatically creates the spark context sc.
A SparkSession is automatically generated and available as 'spark'.
Application name can be accessed using SparkContext.
spark.sparkContext.appName# Configuration is accessible
using RuntimeConfig:from py4j.protocol import
Py4JErrortry: spark.conf.get("some.conf")except Py4JError as e: pass
The following code outline the available spark context sc as well as the new spark session under the name "spark" which includes the previous sqlContext, HiveContext, StreamingContext under one unified single entry point.
sqlContext, HiveContext, StreamingContext still exist to ensure continuity with legacy code in Spark.
HW12256:spark2 usr000$ ./bin/pyspark
Python 2.7.10 (default, Jul 30 2016, 19:40:32)[GCC 4.2.1 Compatible Apple
LLVM 8.0.0 (clang-800.0.34)] on darwinType "help",
"copyright", "credits" or "license" for more
information.Using Spark's default log4j
profile: org/apache/spark/log4j-defaults.propertiesSetting default log level to
"WARN".To adjust logging level use
sc.setLogLevel(newLevel).16/12/29 20:41:27 WARN
NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicableWelcome
to ____ __ / __/__
___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.0.2 /_/Using Python version 2.7.10
(default, Jul 30 2016 19:40:32)SparkSession available as
'spark'.>>> sc<pyspark.context.SparkContext
object at 0x101e9c850>>>>
sc._conf.getAll()[(u'spark.app.id',
u'local-1483040488671'), (u'spark.sql.catalogImplementation', u'hive'),
(u'spark.rdd.compress', u'True'), (u'spark.serializer.objectStreamReset',
u'100'), (u'spark.master', u'local[*]'), (u'spark.executor.id', u'driver'),
(u'spark.submit.deployMode', u'client'), (u'hive.metastore.warehouse.dir',
u'file:/Users/usr000/bin/sparks/spark-2.0.2-bin-hadoop2.7/spark-warehouse'),
(u'spark.driver.port', u'57764'), (u'spark.app.name', u'PySparkShell'),
(u'spark.driver.host', u'000.000.0.0')]>>>
spark<pyspark.sql.session.SparkSession
object at 0x102df9b50>>>>
spark.sparkContext<pyspark.context.SparkContext
object at 0x101e9c850>>>>
spark.sparkContext.appNameu'PySparkShell'>>> from
pyspark.sql.functions import *>>> spark.range(1,
7, 2).collect()16/12/29 20:58:32 WARN
ObjectStore: Version information not found in metastore.
hive.metastore.schema.verification is not enabled so recording the schema
version 1.2.016/12/29 20:58:32 WARN
ObjectStore: Failed to get database default, returning NoSuchObjectException[Row(id=1), Row(id=3),
Row(id=5)]
b.Read CSV
We describe how to easily access csv files from spark and from pandas and load them into dataframe for data exploration, maniputation and mining.
i.Spark 2.0 & Spark 1.6
We can create a spark dataframe directly from reading the csv file.
In order to be compatible with previous format we have include a conditional switch in the format statement
## Spark 2.0 and Spark 1.6 compatible read csv#formatPackage = "csv" if sc.version > '1.6' else
"com.databricks.spark.csv"df = sqlContext.read.format(formatPackage).options(header='true',
delimiter = '|').load("s00_dat/dataframe_sample.csv")df.printSchema()
ii.Pandas
We can create the iris pandas dataframe from the existing dataset from sklearn.
from
sklearn.datasets import load_irisimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltiris = load_iris()df = pd.DataFrame(iris.data, columns=iris.feature_names)df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
c.Dataframes
i.Pandas DataFrames
Pandas dataframes in conjunction with visualization libraries such as matplotlib and seaborn give us some nice insights into the data
ii.Spark DataSets, Spark DataFrames and Spark RDDs
Spark Dataframe and Spark RDDs are the fundamental data structure that allow us to manipulate and interact with the various Spark libraries.
Spark DataSets are more relevant for Scala developpers and give the ability to create typed spark dataframe.
d.Machine Learning
i.SciKit Learn
We demonstrate a random forest machine learning pipeline using scikit learn in the ipython notebook.
ii.Spark MLLib, Spark ML
We demonstrate a random forest machine learning pipeline using Spark MLlib and Spark ML
8-Conclusion
Spark and Jupyter Notebook using the Anaconda Python distribution provide a very powerful development environment in your laptop.
It allows quick exploration of data mining, machine learning, visualizations in a flexible and easy to use environment.
We have described the installation of Jupyter Notebook, Spark. We have described few data processing pipeline as well as a machine learning classification using Random Forest.
... View more
- Find more articles tagged with:
- anaconda
- Data Science & Advanced Analytics
- data-science
- FAQ
- Installation
- machine-learning
- spark-notebook
Labels:
09-18-2016
07:52 PM
1 Kudo
Hi Mike, follow the following steps: 1- in the CLI where spark is installed, first export Hadoop conf export HADOOP_CONF_DIR= ~/etc/hadoop/conf (you may want to put it in your spark conf file: export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}) 2- launch spark-shell val input = sc.textFile("hdfs:///....insert/your/hdfs/file/path...") input.count() //prints the nr of lines read ...
... View more
09-05-2016
04:16 PM
The most easy way just launch "spark-shell" at the command line. This will give you the active version running on your cluster: [root@xxxxxxx ~]# spark-shell
16/09/05 17:15:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.1
/_/ Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information. [root@xxxxxxx ~]# spark-shell
16/09/05 17:15:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.3.1
/_/
Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
... View more
09-05-2016
04:07 PM
Hi Surya, Add SPARK-SFTP library as --packages or --jar in your spark-submit command. I am not sure spark 1.4.1 would be able to handle it. Look at upgrading spark to 1.6.1 Check https://github.com/springml/spark-sftp https://spark-packages.org/package/springml/spark-sftp Include this package in your Spark Applications using: spark-shell, pyspark, or spark-submit > $SPARK_HOME/bin/spark-shell --packages com.springml:spark-sftp_2.10:1.0.1
sbt In your sbt build file, add: libraryDependencies += "com.springml" % "spark-sftp_2.10" % "1.0.1" Maven In your pom.xml, add: <dependencies>
<!-- list of dependencies -->
<dependency>
<groupId>com.springml</groupId>
<artifactId>spark-sftp_2.10</artifactId>
<version>1.0.1</version>
</dependency>
</dependencies>
Releases 1bf5b3 | zip | jar ) / Date: 2016-05-27 / License: Apache-2.0 / Scala version: 2.10
7d5b02 | zip | jar ) / Date: 2016-01-11 / License: Apache-2.0 / Scala version: 2.10
... View more
09-05-2016
03:50 PM
1 Kudo
# Check Commands
# --------------
# Spark Scala
# -----------
# Optionally export Spark Home
export SPARK_HOME=/usr/hdp/current/spark-client
# Spark submit example in local mode
spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark submit example in client mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark submit example in cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 $SPARK_HOME/lib/spark-examples*.jar 10
# Spark shell with yarn client
spark-shell --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
# Pyspark
# -------
# Optionally export Hadoop COnf and PySpark Python
export HADOOP_CONF_DIR=/etc/hadoop/conf
export PYSPARK_PYTHON=/opath/to/bin/python
# PySpark submit example in local mode
spark-submit --verbose /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark submit example in client mode
spark-submit --verbose --master yarn-client /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark submit example in cluster mode
spark-submit --verbose --master yarn-cluster /usr/hdp/2.3.0.0-2557/spark/examples/src/main/python/pi.py 100
# PySpark shell with yarn client
pyspark --master yarn-client
@jigar.patel
... View more
08-30-2016
08:15 PM
2 Kudos
Resolution done for Spark 2.0.0 Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/ [root@sandbox conf]# cat java-opts -Dhdp.version=2.5.0.0-817 Spark Submit working example: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex ecutor-cores 1 examples/jars/spark-examples*.jar 10 16/08/2917:44:57 WARN util.NativeCodeLoader:Unable to load native-hadoop library for your platform...using builtin-java classes where applicable 16/08/2917:44:58 WARN shortcircuit.DomainSocketFactory:Theshort-circuit local reads feature cannot be used because libhadoop cannot be loaded. 16/08/2917:44:58 INFO client.RMProxy:Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/08/2917:44:58 INFO yarn.Client:Requesting a new application from cluster with1NodeManagers 16/08/2917:44:58 INFO yarn.Client:Verifyingour application has not requested more than the maximum memory capability of the cluster (7680 MB per container) 16/08/2917:44:58 INFO yarn.Client:Will allocate AM container,with2248 MB memory including 200 MB overhead 16/08/2917:44:58 INFO yarn.Client:Setting up container launch context forour AM 16/08/2917:44:58 INFO yarn.Client:Setting up the launch environment forour AM container 16/08/2917:44:58 INFO yarn.Client:Preparing resources forour AM container 16/08/2917:44:58 WARN yarn.Client:Neither spark.yarn.jars nor spark.yarn.archive isset, falling back to uploading libraries under SPARK_HOME. 16/08/2917:45:00 INFO yarn.Client:Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip 16/08/2917:45:01 INFO yarn.Client:Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar-> hdfs://sandbox.hortonworks.com:8020/ user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar 16/08/2917:45:01 INFO yarn.Client:Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip 16/08/2917:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode 16/08/2917:45:01 INFO spark.SecurityManager:Changing view acls to: root 16/08/2917:45:01 INFO spark.SecurityManager:Changing modify acls to: root 16/08/2917:45:01 INFO spark.SecurityManager:Changing view acls groups to: 16/08/2917:45:01 INFO spark.SecurityManager:Changing modify acls groups to: 16/08/2917:45:01 INFO spark.SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with view permissions:Set(root); groups with view permiss ions:Set(); users with modify permissions:Set(root); groups with modify permissions:Set() 16/08/2917:45:01 INFO yarn.Client:Submitting application application_1472397144295_0006 to ResourceManager 16/08/2917:45:01 INFO impl.YarnClientImpl:Submitted application application_1472397144295_0006 16/08/2917:45:02 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:02 INFO yarn.Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Registerwith RM ApplicationMaster host: N/A ApplicationMaster RPC port:-1 queue:default start time:1472492701409 final status: UNDEFINED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:03 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:04 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:05 INFO yarn.Client:Application report for application_1472397144295_0006 (state: ACCEPTED) 16/08/2917:45:06 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:06 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host:10.0.2.15 ApplicationMaster RPC port:0 queue:default start time:1472492701409 final status: UNDEFINED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:07 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:08 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:09 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:10 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:11 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:12 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:13 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:14 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:15 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:16 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:17 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:18 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:19 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:20 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:21 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:22 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:23 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:24 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:25 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:26 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:27 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:28 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:29 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:30 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:31 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:32 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:33 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:34 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:35 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:36 INFO yarn.Client:Application report for application_1472397144295_0006 (state: RUNNING) 16/08/2917:45:37 INFO yarn.Client:Application report for application_1472397144295_0006 (state: FINISHED) 16/08/2917:45:37 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host:10.0.2.15 ApplicationMaster RPC port:0 queue:default start time:1472492701409 final status: SUCCEEDED tracking URL:http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/ user: root 16/08/2917:45:37 INFO util.ShutdownHookManager:Shutdown hook called 16/08/2917:45:37 INFO util.ShutdownHookManager:Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b [root@sandbox spark2-client]# Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 Spark Shell working example: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1 Settingdefault log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). 16/08/2917:47:09 WARN yarn.Client:Neither spark.yarn.jars nor spark.yarn.archive isset, falling back to uploading libraries under SPARK_HOME. 16/08/2917:47:21 WARN spark.SparkContext:Use an existing SparkContext, some configuration may not take effect. Spark context Web UI available at http://10.0.2.15:4041 Spark context available as'sc'(master = yarn, app id = application_1472397144295_0007). Spark session available as'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/'_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101) Type in expressions to have them evaluated. Type :help for more information. scala> sc.getConf.getAll.foreach(println) (spark.eventLog.enabled,true) (spark.yarn.scheduler.heartbeat.interval-ms,5000) (hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse) (spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173) (spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817) (spark.yarn.containerLauncherMaxThreads,25) (spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817) (spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64) (spark.driver.appUIAddress,http://10.0.2.15:4041) (spark.driver.host,10.0.2.15) (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007) (spark.yarn.preserve.staging.files,false) (spark.home,/usr/hdp/current/spark2-client) (spark.app.name,Spark shell) (spark.repl.class.uri,spark://10.0.2.15:37426/classes) (spark.ui.port,4041) (spark.yarn.max.executor.failures,3) (spark.submit.deployMode,client) (spark.yarn.executor.memoryOverhead,200) (spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) (spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar) (spark.executor.memory,2g) (spark.yarn.driver.memoryOverhead,200) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native) (spark.app.id,application_1472397144295_0007) (spark.executor.id,driver) (spark.yarn.queue,default) (spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com) (spark.eventLog.dir,hdfs:///spark-history) (spark.master,yarn) (spark.driver.port,37426) (spark.yarn.submit.file.replication,3) (spark.sql.catalogImplementation,hive) (spark.driver.memory,2g) (spark.jars,) (spark.executor.cores,1) scala> val file = sc.textFile("/tmp/data") file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24 scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26 scala> counts.take(10) res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA. layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac he.log4j.PatternLayout,1)) scala>
... View more
08-30-2016
01:01 PM
1 Kudo
Resolution for Spark Submit issue: add java-opts file in /usr/hdp/current/spark2-client/conf/ [root@sandbox conf]# cat java-opts
-Dhdp.version=2.5.0.0-817 Spark Submit working example: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --ex
ecutor-cores 1 examples/jars/spark-examples*.jar 10
16/08/29 17:44:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/29 17:44:58 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/29 17:44:58 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/08/29 17:44:58 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/08/29 17:44:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/29 17:44:58 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead
16/08/29 17:44:58 INFO yarn.Client: Setting up container launch context for our AM
16/08/29 17:44:58 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/29 17:44:58 INFO yarn.Client: Preparing resources for our AM container
16/08/29 17:44:58 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/29 17:45:00 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_libs__3503948162159958877.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_libs__3503948162159958877.zip
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/
user/root/.sparkStaging/application_1472397144295_0006/spark-examples_2.11-2.0.0.jar
16/08/29 17:45:01 INFO yarn.Client: Uploading resource file:/tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b/__spark_conf__4613069544481307021.zip -> hdfs://sandbox.hortonw
orks.com:8020/user/root/.sparkStaging/application_1472397144295_0006/__spark_conf__.zip
16/08/29 17:45:01 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls to: root
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls to: root
16/08/29 17:45:01 INFO spark.SecurityManager: Changing view acls groups to:
16/08/29 17:45:01 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/29 17:45:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permiss
ions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/29 17:45:01 INFO yarn.Client: Submitting application application_1472397144295_0006 to ResourceManager
16/08/29 17:45:01 INFO impl.YarnClientImpl: Submitted application application_1472397144295_0006
16/08/29 17:45:02 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:02 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472492701409
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:03 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:04 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:05 INFO yarn.Client: Application report for application_1472397144295_0006 (state: ACCEPTED)
16/08/29 17:45:06 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:06 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
ApplicationMaster RPC port: 0
queue: default
start time: 1472492701409
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:07 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:08 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:09 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:10 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:11 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:12 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:13 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:14 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:15 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:16 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:17 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:18 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:19 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:20 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:21 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:22 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:23 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:24 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:25 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:26 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:27 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:28 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:29 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:30 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:31 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:32 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:33 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:34 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:35 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:36 INFO yarn.Client: Application report for application_1472397144295_0006 (state: RUNNING)
16/08/29 17:45:37 INFO yarn.Client: Application report for application_1472397144295_0006 (state: FINISHED)
16/08/29 17:45:37 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
ApplicationMaster RPC port: 0
queue: default
start time: 1472492701409
final status: SUCCEEDED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0006/
user: root
16/08/29 17:45:37 INFO util.ShutdownHookManager: Shutdown hook called
16/08/29 17:45:37 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-38890bfc-d672-4c7d-bef9-d646c420836b
[root@sandbox spark2-client]#
Resolution for Spark Shell issue (lzo-codec): add the following 2 lines in your spark-defaults.conf spark.driver.extraClassPath /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 Spark Shell working example: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/29 17:47:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/29 17:47:21 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.0.2.15:4041
Spark context available as 'sc' (master = yarn, app id = application_1472397144295_0007).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala> sc.getConf.getAll.foreach(println)
(spark.eventLog.enabled,true)
(spark.yarn.scheduler.heartbeat.interval-ms,5000)
(hive.metastore.warehouse.dir,file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse)
(spark.repl.class.outputDir,/tmp/spark-fa16d4d3-8ec8-4b0e-a1da-5a2dffe39d08/repl-5dd28f29-ae03-4965-a535-18a95173b173)
(spark.yarn.am.extraJavaOptions,-Dhdp.version=2.5.0.0-817)
(spark.yarn.containerLauncherMaxThreads,25)
(spark.driver.extraJavaOptions,-Dhdp.version=2.5.0.0-817)
(spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64)
(spark.driver.appUIAddress,http://10.0.2.15:4041)
(spark.driver.host,10.0.2.15)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://sandbox.hortonworks.com:8088/proxy/application_1472397144295_0007)
(spark.yarn.preserve.staging.files,false)
(spark.home,/usr/hdp/current/spark2-client)
(spark.app.name,Spark shell)
(spark.repl.class.uri,spark://10.0.2.15:37426/classes)
(spark.ui.port,4041)
(spark.yarn.max.executor.failures,3)
(spark.submit.deployMode,client)
(spark.yarn.executor.memoryOverhead,200)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraClassPath,/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar)
(spark.executor.memory,2g)
(spark.yarn.driver.memoryOverhead,200)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native)
(spark.app.id,application_1472397144295_0007)
(spark.executor.id,driver)
(spark.yarn.queue,default)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,sandbox.hortonworks.com)
(spark.eventLog.dir,hdfs:///spark-history)
(spark.master,yarn)
(spark.driver.port,37426)
(spark.yarn.submit.file.replication,3)
(spark.sql.catalogImplementation,hive)
(spark.driver.memory,2g)
(spark.jars,)
(spark.executor.cores,1)
scala> val file = sc.textFile("/tmp/data")
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24
scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26
scala> counts.take(10)
res1: Array[(String, Int)] = Array((hadoop.tasklog.noKeepSplits=4,1), (log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.se
rver.resourcemanager.appsummary.logger},1), (Unless,1), (this,4), (hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log,1), (under,4), (log4j.appender.RFA.
layout.ConversionPattern=%d{ISO8601},2), (log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout,1), (AppSummaryLogging,1), (log4j.appender.RMAUDIT.layout=org.apac
he.log4j.PatternLayout,1))
scala>
... View more
08-30-2016
08:15 AM
1 Kudo
Sandbox HDP-2.5.0 Spark 2.0.0 - Spark Submit Yarn Cluster Mode -- Spark Shell LzoCodec not found I have installed Spark 2.0.0 in Sandbox HDP-2.5.0 in accordance to Paul Hargis great post: https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.html Thanks Paul. Spark-Submit in Yarn-Client mode works as per log here: [root@sandbox ~]# cd /usr/hdp/current/spark2-client
[root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-core
s 1 examples/jars/spark-examples*.jar 10
16/08/28 14:38:42 INFO spark.SparkContext: Running Spark version 2.0.0
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls to: root
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:38:42 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:38:42 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/28 14:38:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/28 14:38:43 INFO util.Utils: Successfully started service 'sparkDriver' on port 36008.
16/08/28 14:38:43 INFO spark.SparkEnv: Registering MapOutputTracker
16/08/28 14:38:43 INFO spark.SparkEnv: Registering BlockManagerMaster
16/08/28 14:38:43 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b5149ef4-928d-455e-bf83-2159e12f88f7
16/08/28 14:38:43 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
16/08/28 14:38:43 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/08/28 14:38:43 INFO util.log: Logging initialized @2226ms
16/08/28 14:38:43 INFO server.Server: jetty-9.2.z-SNAPSHOT
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e1e5b02{/jobs,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ae918c9{/jobs/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d5a39b7{/jobs/job,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5e83450d{/jobs/job/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7c2a88f4{/stages,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c858adb{/stages/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@535f571c{/stages/stage,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@18501a07{/stages/stage/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32dcce09{/stages/pool,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e5acaf5{/stages/pool/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3ac2bace{/storage,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46764885{/storage/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7f9337e6{/storage/rdd,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a3b1e79{/storage/rdd/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f4da763{/environment,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@232864a3{/environment/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@30e71b5d{/executors,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14b58fc0{/executors/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bf090df{/executors/threadDump,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4eb72ecd{/executors/threadDump/json,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c61bd1a{/static,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@14c62558{/,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5cbdbf0f{/api,null,AVAILABLE}
16/08/28 14:38:43 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,AVAILABLE}
16/08/28 14:38:43 INFO server.ServerConnector: Started ServerConnector@51fcbb35{HTTP/1.1}{0.0.0.0:4041}
16/08/28 14:38:43 INFO server.Server: Started @2388ms
16/08/28 14:38:43 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a>
16/08/28 14:38:43 INFO spark.SparkContext: Added JAR file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar at spark://10.0.2.15:36008/jars/spark-examples_2.11
-2.0.0.jar with timestamp 1472395123767
16/08/28 14:38:44 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers0.0.2.15:8050
16/08/28 14:38:44 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/28 14:38:44 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/08/28 14:38:44 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/28 14:38:44 INFO yarn.Client: Preparing resources for our AM container
16/08/28 14:38:44 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_libs__6748274495232790272.zip419767250f0/__spark_libs__6748274495232790272.zip -> hdfs://sandbox.hortonworks.com:8
16/08/28 14:38:48 INFO yarn.Client: Uploading resource file:/tmp/spark-a10e8972-1076-4a61-a014-8419767250f0/__spark_conf__6530127439911581770.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0001/__spark_conf__.zip
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:38:48 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:38:48 INFO spark.SecurityManager: Changing modify acls groups to:
); users with modify permissions: Set(root); groups with modify permissions: Set()led; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
16/08/28 14:38:48 INFO yarn.Client: Submitting application application_1472394965674_0001 to ResourceManager
16/08/28 14:38:48 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0001
16/08/28 14:38:49 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)ation_1472394965674_0001 and attemptId None
16/08/28 14:38:49 INFO yarn.Client:
client token: N/A
ApplicationMaster host: N/As launched, waiting for AM container to Register with RM
ApplicationMaster RPC port: -1
queue: default
final status: UNDEFINED18
tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a>
user: root
16/08/28 14:38:51 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)
16/08/28 14:38:52 INFO yarn.Client: Application report for application_1472394965674_0001 (state: ACCEPTED)
16/08/28 14:38:52 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
PROXY_URI_BASES -> <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001</a>), /proxy/application_1472394965674_0001lter, Map(PROXY_HOSTS -> sandbox.hortonworks.com,
16/08/28 14:38:52 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
16/08/28 14:38:53 INFO yarn.Client: Application report for application_1472394965674_0001 (state: RUNNING)
16/08/28 client token: N/An.Client:
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
queue: defaultter RPC port: 0
start time: 1472395128618
final status: UNDEFINED
user: rootRL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0001/</a>
16/08/28 14:38:53 INFO cluster.YarnClientSchedulerBackend: Application application_1472394965674_0001 has started running.
16/08/28 14:38:53 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35756.
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:53 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:35756 with 912.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:53 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 35756)
16/08/28 14:38:54 INFO scheduler.EventLoggingListener: Logging events to hdfs:///spark-history/application_1472394965674_0001
16/08/28 14:38:56 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36932) with ID 1
16/08/28 14:38:56 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41061 with 912.3 MB RAM, BlockManagerId(1, sandbox.hortonworks.com, 4106
16/08/28 14:38:57 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.0.2.15:36936) with ID 2
16/08/28 14:38:57 INFO storage.BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:41746 with 912.3 MB RAM, BlockManagerId(2, sandbox.hortonworks.com, 4174
6)
16/08/28 14:38:57 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.ter reached minRegisteredResourcesRatio: 0.8
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46a61277{/SQL,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b4b5885{/SQL/json,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2bcd7bea{/SQL/execution/json,null,AVAILABLE}
16/08/28 14:38:57 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59bde227{/static/sql,null,AVAILABLE}
16/08/28 14:38:57 INFO internal.SharedState: Warehouse path is 'file:/usr/hdp/2.5.0.0-817/spark2/spark-warehouse'.
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 912.3 MB)
16/08/28 14:38:57 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 912.3 MB)
16/08/28 14:38:57 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012size: 1169.0 B, free: 912.3 MB)
16/08/28 14:38:57 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
16/08/28 14:38:57 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
16/08/28 14:38:57 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, sandbox.hortonworks.com, partition 1, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 0 on executor id: 2 hostname: sandbox.hortonworks.com.
16/08/28 14:38:58 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 1 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on sandbox.hortonworks.com:41746 (size: 1169.0 B, free: 912.3 MB)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, sandbox.hortonworks.com, partition 2, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 2 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 3 on executor id: 2 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1084 ms on sandbox.hortonworks.com (1/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 1061 ms on sandbox.hortonworks.com (2/10)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 4 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 88 ms on sandbox.hortonworks.com (3/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, sandbox.hortonworks.com, partition 5, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 101 ms on sandbox.hortonworks.com (4/10)works.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, sandbox.hortonworks.com, partition 6, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 6 on executor id: 1 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, sandbox.hortonworks.com, partition 7, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 7 on executor id: 2 hostname: sandbox.hortonworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 48 ms on sandbox.hortonworks.com (6/10)
16/08/28 14:38:59 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 8 on executor id: 1 hostname: sandbox.hortonworks.com.5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 48 ms on sandbox.hortonworks.com (7/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, sandbox.hortonworks.com, partition 9, PROCESS_LOCAL, 5411 bytes)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 40 ms on sandbox.hortonworks.com (8/10)nworks.com.
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 38 ms on sandbox.hortonworks.com (9/10)
16/08/28 14:38:59 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 31 ms on sandbox.hortonworks.com (10/10)
16/08/28 14:38:59 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.293 s
16/08/28 14:38:59 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.605653 s
Pi is roughly 3.1418151418151417
16/08/28 14:38:59 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2d4aa15a{/stages/stage/kill,null,UNAVAILABLE}
Spark-Submit in Yarn-cluster mode fails as per log here: [root@sandbox spark2-client]# ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cor
es 1 examples/jars/spark-examples*.jar 10
16/08/28 14:41:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 14:41:08 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/08/28 14:41:08 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/08/28 14:41:09 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/08/28 14:41:09 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (7680 MB per container)
16/08/28 14:41:09 INFO yarn.Client: Will allocate AM container, with 2248 MB memory including 200 MB overhead
16/08/28 14:41:09 INFO yarn.Client: Setting up container launch context for our AM
16/08/28 14:41:09 INFO yarn.Client: Setting up the launch environment for our AM container
16/08/28 14:41:09 INFO yarn.Client: Preparing resources for our AM container
16/08/28 14:41:09 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/28 14:41:10 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_libs__4204158628332382181.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_libs__4204158628332382181.zip
16/08/28 14:41:11 INFO yarn.Client: Uploading resource file:/usr/hdp/2.5.0.0-817/spark2/examples/jars/spark-examples_2.11-2.0.0.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/
.sparkStaging/application_1472394965674_0002/spark-examples_2.11-2.0.0.jar
16/08/28 14:41:12 INFO yarn.Client: Uploading resource file:/tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc/__spark_conf__2789110900476377363.zip -> hdfs://sandbox.hortonworks.com:8
020/user/root/.sparkStaging/application_1472394965674_0002/__spark_conf__.zip
16/08/28 14:41:12 WARN yarn.Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls to: root
16/08/28 14:41:12 INFO spark.SecurityManager: Changing view acls groups to:
16/08/28 14:41:12 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/28 14:41:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(
); users with modify permissions: Set(root); groups with modify permissions: Set()
16/08/28 14:41:12 INFO yarn.Client: Submitting application application_1472394965674_0002 to ResourceManager
16/08/28 14:41:12 INFO impl.YarnClientImpl: Submitted application application_1472394965674_0002
16/08/28 14:41:13 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:13 INFO yarn.Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472395272580
final status: UNDEFINED
tracking URL: <a href="http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/">http://sandbox.hortonworks.com:8088/proxy/application_1472394965674_0002/</a>
user: root
16/08/28 14:41:14 INFO yarn.Client: Application report for application_1472394965674_0002 (state: ACCEPTED)
16/08/28 14:41:15 INFO yarn.Client: Application report for application_1472394965674_0002 (state: FAILED)
16/08/28 14:41:15 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1472394965674_0002 failed 2 times due to AM Container for appattempt_1472394965674_0002_000002 exited with exitCode: 1
For more detailed output, check the application tracking page: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a> Then click on links to logs of each att
empt.
Diagnostics: Exception from container-launch.
Container id: container_e17_1472394965674_0002_02_000001
Exit code: 1
Exception message: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh: line 25: $PWD:$PWD/__spa
rk_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-
doop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework
/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/
hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /hadoop/yarn/local/usercache/root/appcache/application_1472394965674_0002/container_e17_1472394965674_0002_02_000001/launch_container.sh:
line 25: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:
/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-f
yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/
hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.run(Shell.java:820)va:909)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1099)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81))
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.lang.Thread.run(Thread.java:745)or$Worker.run(ThreadPoolExecutor.java:615)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
start time: 1472395272580
final status: FAILED
tracking URL: <a href="http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002">http://sandbox.hortonworks.com:8088/cluster/app/application_1472394965674_0002</a>
16/08/28 14:41:15 INFO yarn.Client: Deleting staging directory hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1472394965674_0002
Exception in thread "main" org.apache.spark.SparkException: Application application_1472394965674_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client.main(Client.scala):1175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at java.lang.reflect.Method.invoke(Method.java:606)DelegatingMethodAccessorImpl.java:43)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)0)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/28 14:41:15 INFO util.ShutdownHookManager: Shutdown hook called
[root@sandbox spark2-client]# utdownHookManager: Deleting directory /tmp/spark-e72e7961-7ec9-4282-806d-9d95e2d7f0fc
Any help to resolve this would be appreciated. In Spark-Shell mode, called with the following command: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn I am encountering a LzoCodec not found error, as per log here: [root@sandbox spark2-client]# ./bin/spark-shell --master yarn
Setting default log level to "WARN".
16/08/28 14:44:42 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/08/28 14:44:54 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at <a href="http://10.0.2.15:4041/">http://10.0.2.15:4041</a>
Spark session available as 'spark'.ster = yarn, app id = application_1472394965674_0003).
Welcome to
____ __
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file = sc.textFile("/tmp/data")
file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:24
java.lang.RuntimeException: Error in configuring object)).map(word => (word, 1)).reduceByKey(_ + _)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:186).java:136)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)
at scala.Option.getOrElse(Option.scala:121)ions$2.apply(RDD.scala:246)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)D.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:246)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)tions.scala:328)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
... 48 elided.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:327)
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
... 83 morehe.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
... 85 morehe.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
scala>
Any help to resolve this would be appreciated. Thanks. Amit
... View more
Labels:
08-29-2016
02:07 PM
Zeppelin + PySpark (1.6.* or 2.0.0) - I want to know how I can add Python libraries such as Numpy/Pandas/SKLearn... Additional question: If I install Anaconda Python and its repo - How do I need to configure the Zeppelin interpreters so that PySpark works well with the anaconda python repo
... View more
Labels:
- Labels:
-
Apache Zeppelin
08-28-2016
09:29 PM
2 Kudos
Sandbox HDP-2.5.0 TP Spark 1.6.2 - I am encounterning the following ERROR GPLNativeCodeLoader: Could not load native gpl library - ERROR LzoCodec: Cannot load native-lzo without native-hadoop while running a simple word count on spark-shell [root@sandbox ~]# cd $SPARK_HOME [root@sandbox spark-client]# ./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar The following code is submitted at the Spark CLI val file = sc.textFile("/tmp/data") val counts = file.flatMap(line => line.split(" ")).map(word =>(word,1)). reduceByKey(_ + _) counts.saveAsTextFile("/tmp/wordcount")
This yields the following error: ERROR GPLNativeCodeLoader: Could not load native gpl library ERROR LzoCodec: Cannot load native-lzo without native-hadoop The same error appear with or without adding the --jars parameter as here under: --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar Full Log: [root@sandbox ~]# cd $SPARK_HOME [root@sandbox spark-client]# ./bin/spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m --jars /us r/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar 16/08/2716:28:23 INFO SecurityManager:Changing view acls to: root 16/08/2716:28:23 INFO SecurityManager:Changing modify acls to: root 16/08/2716:28:23 INFO SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with view permis sions:Set(root); users with modify permissions:Set(root) 16/08/2716:28:23 INFO HttpServer:Starting HTTP Server 16/08/2716:28:23 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/2716:28:23 INFO AbstractConnector:StartedSocketConnector@0.0.0.0:43011 16/08/2716:28:23 INFO Utils:Successfully started service 'HTTP class server' on port 43011. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/'_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.2 /_/ Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_101) Type in expressions to have them evaluated. Type :help for more information. 16/08/27 16:28:26 INFO SparkContext: Running Spark version 1.6.2 16/08/27 16:28:26 INFO SecurityManager: Changing view acls to: root 16/08/27 16:28:26 INFO SecurityManager: Changing modify acls to: root 16/08/27 16:28:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permis sions: Set(root); users with modify permissions: Set(root) 16/08/27 16:28:26 INFO Utils: Successfully started service 'sparkDriver' on port 45506. 16/08/27 16:28:27 INFO Slf4jLogger: Slf4jLogger started 16/08/27 16:28:27 INFO Remoting: Starting remoting 16/08/27 16:28:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:44 829] 16/08/27 16:28:27 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44829. 16/08/27 16:28:27 INFO SparkEnv: Registering MapOutputTracker 16/08/27 16:28:27 INFO SparkEnv: Registering BlockManagerMaster 16/08/27 16:28:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0776b175-5dd7-49b9-adf7-f2cbd85a1e1b 16/08/27 16:28:27 INFO MemoryStore: MemoryStore started with capacity 143.6 MB 16/08/27 16:28:27 INFO SparkEnv: Registering OutputCommitCoordinator 16/08/27 16:28:27 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/27 16:28:27 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/08/27 16:28:27 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/08/27 16:28:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040 16/08/27 16:28:27 INFO HttpFileServer: HTTP File server directory is /tmp/spark-61ecb98e-989c-4396-9b30-032c4d5a2b90/httpd -857ce699-7db0-428c-9af5-1dca4ec5330d 16/08/27 16:28:27 INFO HttpServer: Starting HTTP Server 16/08/27 16:28:27 INFO Server: jetty-8.y.z-SNAPSHOT 16/08/27 16:28:27 INFO AbstractConnector: Started SocketConnector@0.0.0.0:37515 16/08/27 16:28:27 INFO Utils: Successfully started service 'HTTP file server' on port 37515. 16/08/27 16:28:27 INFO SparkContext: Added JAR file:/usr/hdp/2.5.0.0-817/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-817.jar at ht tp://10.0.2.15:37515/jars/hadoop-lzo-0.6.0.2.5.0.0-817.jar with timestamp 1472315307772 spark.yarn.driver.memoryOverhead is set but does not apply in client mode. 16/08/27 16:28:28 INFO TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/08/27 16:28:28 INFO RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 16/08/27 16:28:28 INFO Client: Requesting a new application from cluster with 1 NodeManagers 16/08/27 16:28:28 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2250 MB per container) 16/08/27 16:28:28 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/08/27 16:28:28 INFO Client: Setting up container launch context for our AM 16/08/27 16:28:28 INFO Client: Setting up the launch environment for our AM container 16/08/27 16:28:28 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs:/ /sandbox.hortonworks.com:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:28 INFO Client: Preparing resources for our AM container 16/08/27 16:28:28 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs:/ /sandbox.hortonworks.com:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:28 INFO Client: Source and destination file systems are the same. Not copying hdfs://sandbox.hortonworks.co m:8020/hdp/apps/2.5.0.0-817/spark/spark-hdp-assembly.jar 16/08/27 16:28:29 INFO Client: Uploading resource file:/tmp/spark-61ecb98e-989c-4396-9b30-032c4d5a2b90/__spark_conf__50848 04354575467223.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1472312154461_0006/__spark_c onf__5084804354575467223.zip 16/08/27 16:28:29 INFO SecurityManager: Changing view acls to: root 16/08/27 16:28:29 INFO SecurityManager: Changing modify acls to: root 16/08/27 16:28:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permis sions: Set(root); users with modify permissions: Set(root) 16/08/27 16:28:29 INFO Client: Submitting application 6 to ResourceManager 16/08/27 16:28:29 INFO YarnClientImpl: Submitted application application_1472312154461_0006 16/08/27 16:28:29 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1472312154461_000 6 and attemptId None 16/08/27 16:28:30 INFO Client: Application report for application_1472312154461_0006 (state: ACCEPTED) 16/08/27 16:28:30 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1472315309252 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472312154461_0006/ user: root 16/08/27 16:28:31 INFO Client: Application report for application_1472312154461_0006 (state: ACCEPTED) 16/08/27 16:28:32 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(nul l) 16/08/27 16:28:32 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpF ilter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, PROXY_URI_BASES -> http://sandbox.hortonworks.com:8088/proxy/applicatio n_1472312154461_0006), /proxy/application_1472312154461_0006 16/08/27 16:28:32 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/08/27 16:28:32 INFO Client: Application report for application_1472312154461_0006 (state: RUNNING) 16/08/27 16:28:32 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.0.2.15 ApplicationMaster RPC port: 0 queue: default start time: 1472315309252 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1472312154461_0006/ user: root 16/08/27 16:28:32 INFO YarnClientSchedulerBackend: Application application_1472312154461_0006 has started running. 16/08/27 16:28:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on p ort 34124. 16/08/27 16:28:32 INFO NettyBlockTransferService: Server created on 34124 16/08/27 16:28:32 INFO BlockManagerMaster: Trying to register BlockManager 16/08/27 16:28:32 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:34124 with 143.6 MB RAM, BlockManag erId(driver, 10.0.2.15, 34124) 16/08/27 16:28:32 INFO BlockManagerMaster: Registered BlockManager 16/08/27 16:28:32 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1472312154461_0006 16/08/27 16:28:36 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (sandbox.hortonworks.com: 39728) with ID 1 16/08/27 16:28:36 INFO BlockManagerMasterEndpoint: Registering block manager sandbox.hortonworks.com:38362 with 143.6 MB R AM, BlockManagerId(1, sandbox.hortonworks.com, 38362) 16/08/27 16:28:57 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxReg isteredResourcesWaitingTime: 30000(ms) 16/08/27 16:28:57 INFO SparkILoop: Created spark context.. Spark context available as sc. 16/08/27 16:28:58 INFO HiveContext: Initializing execution hive, version 1.2.1 16/08/27 16:28:58 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.5.0.0-817 16/08/27 16:28:58 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.5.0.0-8 17 16/08/27 16:28:58 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.Objec tStore 16/08/27 16:28:58 INFO ObjectStore: ObjectStore, initialize called 16/08/27 16:28:58 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 16/08/27 16:28:58 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 16/08/27 16:28:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/08/27 16:28:59 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 16/08/27 16:29:00 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,Stor ageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 16/08/27 16:29:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" s o does not have its own datastore table. 16/08/27 16:29:02 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:02 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" s o does not have its own datastore table. 16/08/27 16:29:02 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 16/08/27 16:29:02 INFO ObjectStore: Initialized ObjectStore 16/08/27 16:29:02 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 16/08/27 16:29:02 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/08/27 16:29:03 INFO HiveMetaStore: Added admin role in metastore 16/08/27 16:29:03 INFO HiveMetaStore: Added public role in metastore 16/08/27 16:29:03 INFO HiveMetaStore: No user is added in admin role, since config is empty 16/08/27 16:29:03 INFO HiveMetaStore: 0: get_all_databases 16/08/27 16:29:03 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_all_databases 16/08/27 16:29:03 INFO HiveMetaStore: 0: get_functions: db=default pat=* 16/08/27 16:29:03 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=* 16/08/27 16:29:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-o nly" so does not have its own datastore table. 16/08/27 16:29:03 INFO SessionState: Created local directory: /tmp/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec_resources 16/08/27 16:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec 16/08/27 16:29:03 INFO SessionState: Created local directory: /tmp/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec 16/08/27 16:29:03 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebb0a60-b229-4dad-94a3-e2386ba7b4ec/_tmp_spac e.db 16/08/27 16:29:03 INFO HiveContext: default warehouse location is /user/hive/warehouse 16/08/27 16:29:03 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 16/08/27 16:29:03 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.5.0.0-817 16/08/27 16:29:03 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.5.0.0-8 17 16/08/27 16:29:04 INFO metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 16/08/27 16:29:04 INFO metastore: Connected to metastore. 16/08/27 16:29:04 INFO SessionState: Created local directory: /tmp/83a1e2d3-8c24-4f12-9841-fab259a77514_resources 16/08/27 16:29:04 INFO SessionState: Created HDFS directory: /tmp/hive/root/83a1e2d3-8c24-4f12-9841-fab259a77514 16/08/27 16:29:04 INFO SessionState: Created local directory: /tmp/root/83a1e2d3-8c24-4f12-9841-fab259a77514 16/08/27 16:29:04 INFO SessionState: Created HDFS directory: /tmp/hive/root/83a1e2d3-8c24-4f12-9841-fab259a77514/_tmp_spac e.db 16/08/27 16:29:04 INFO SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> val file = sc.textFile("/tmp/data") 16/08/27 16:29:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 234.8 KB, free 234.8 KB) 16/08/27 16:29:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.1 KB, free 262.9 KB) 16/08/27 16:29:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:34124 (size: 28.1 KB, free: 143.6 MB) 16/08/27 16:29:20 INFO SparkContext: Created broadcast 0 from textFile at <console>:27 file: org.apache.spark.rdd.RDD[String] = /tmp/data MapPartitionsRDD[1] at textFile at <console>:27 scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 16/08/27 16:29:35 ERROR GPLNativeCodeLoader: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1889) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2147) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2112) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:189) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65) at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.PairRDDFunctions$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:331) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:323) at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:330) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:29) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:34) at $line19.$read$iwC$iwC$iwC$iwC$iwC$iwC.<init>(<console>:36) at $line19.$read$iwC$iwC$iwC$iwC$iwC.<init>(<console>:38) at $line19.$read$iwC$iwC$iwC$iwC.<init>(<console>:40) at $line19.$read$iwC$iwC$iwC.<init>(<console>:42) at $line19.$read$iwC$iwC.<init>(<console>:44) at $line19.$read$iwC.<init>(<console>:46) at $line19.$read.<init>(<console>:48) at $line19.$read$.<init>(<console>:52) at $line19.$read$.<clinit>(<console>) at $line19.$eval$.<init>(<console>:7) at $line19.$eval$.<clinit>(<console>) at $line19.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.s cala:997) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:94 5) at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:94 5) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/08/27 16:29:35 ERROR LzoCodec: Cannot load native-lzo without native-hadoop 16/08/27 16:29:35 INFO FileInputFormat: Total input paths to process : 1 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:29 scala> Please help to fix this issue.
... View more
Labels:
08-24-2016
06:05 AM
How do we set SPARK_MAJOR_VERSION. In which conf file. Are there any other related conf files to maintain?
... View more
08-24-2016
06:02 AM
I have downloaded Sandbox HDP2.5. I would like to activate Spark 2.0.0. It activated by default Spark 1.6.2. Ambari Server 'start' completed successfully.
[root@sandbox ~]# spark-shell
SPARK_MAJOR_VERSION is not set, choosing Spark automatically
16/08/23 21:29:14 INFO SecurityManager: Changing view acls to: root
16/08/23 21:29:14 INFO SecurityManager: Changing modify acls to: root
16/08/23 21:29:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with vie
w permissions: Set(root); users with modify permissions: Set(root)
16/08/23 21:29:14 INFO HttpServer: Starting HTTP Server
16/08/23 21:29:14 INFO Server: jetty-8.y.z-SNAPSHOT
16/08/23 21:29:14 INFO AbstractConnector: Started SocketConnector@0.0.0.0:35616
16/08/23 21:29:14 INFO Utils: Successfully started service 'HTTP class server' on port 35616.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
16/08/23 21:29:18 INFO SparkContext: Running Spark version 1.6.2
16/08/23 21:29:18 INFO SecurityManager: Changing view acls to: root
16/08/23 21:29:18 INFO SecurityManager: Changing modify acls to: root
16/08/23 21:29:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with vie
w permissions: Set(root); users with modify permissions: Set(root)
16/08/23 21:29:18 INFO Utils: Successfully started service 'sparkDriver' on port 46658.
16/08/23 21:29:18 INFO Slf4jLogger: Slf4jLogger started
... View more
Labels:
- Labels:
-
Apache Spark
08-24-2016
05:55 AM
By increasing the width of the chrome (browser) window on Zeppelin (I am accessing zeppelin on 127.0.0.1:9995/#/), it switched on the zeppelin clone button bar
... View more
08-23-2016
09:56 PM
Hi Tim, I launched sandbox HDP 2.5. When I go to 127.0.0.1:9995, Zeppelin is an "empty", "static" page with no top bars and no pre-existing zeppelin notebooks... Was it the case when you first launched Zeppelin on HDP2.5 - How do you set Spark 2.0 screenshot-2016-08-23-213704.jpg
... View more
08-23-2016
09:45 PM
Thank you zblanco. I was able to log into ssh online through 127.0.0.1:4200. I reset the ambari password as per the tutorial. I can access the ambari console as admin with all the various dashboards and componenets. I can not ssh via terminal as I am getting conflict on RSA key from my other HDP2.4 sandbox. Is it possible to add the HDP2.5 key and be able to use both sandboxes through ssh? $ ssh root@127.0.0.1 -p 2222
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:/kRUlZqqoBGnsyfJkjU2jScS/vP1/VEpk5ejg8bnlRI.
Please contact your system administrator.
Add correct host key in /Users/xxxxx/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/xxxxx/.ssh/known_hosts:5
RSA host key for [127.0.0.1]:2222 has changed and you have requested strict checking.
Host key verification failed. I am able to call spark on the terminal - spark-shell and pyspark. I am getting spark 1.6.2 - How do I select spark 2.0.0 as my default. [root@sandbox ~]# spark-shell
SPARK_MAJOR_VERSION is not set, choosing Spark automatically
16/08/23 21:29:14 INFO SecurityManager: Changing view acls to: root
16/08/23 21:29:14 INFO SecurityManager: Changing modify acls to: root
16/08/23 21:29:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with vie
w permissions: Set(root); users with modify permissions: Set(root)
16/08/23 21:29:14 INFO HttpServer: Starting HTTP Server
16/08/23 21:29:14 INFO Server: jetty-8.y.z-SNAPSHOT
16/08/23 21:29:14 INFO AbstractConnector: Started SocketConnector@0.0.0.0:35616
16/08/23 21:29:14 INFO Utils: Successfully started service 'HTTP class server' on port 35616.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_101)
Type in expressions to have them evaluated.
Type :help for more information.
16/08/23 21:29:18 INFO SparkContext: Running Spark version 1.6.2
16/08/23 21:29:18 INFO SecurityManager: Changing view acls to: root
16/08/23 21:29:18 INFO SecurityManager: Changing modify acls to: root
I am still unable to get a login UI to Zeppelin. In Ambari, Zeppelin is showing green, up and running... Any idea on how to get Zeppelin to work would be appreciated. Thanks.
... View more
08-23-2016
08:57 PM
1 Kudo
I have downloaded Sandbox HDP 2.5 TP. It is starting. When I go in Ambari page, I just get the "ambari views", i am not getting the ambari console with the ambari dashboard and dashboard of all teh components (HDFS, YARN, Hive, Spark...) I went to port 9995 to open Zeppelin, I am just getting a "static" zeppelin page with no zeppelin notebooks and interpreters... Can you help fix it.screenshot-2016-08-23-213546.jpgscreenshot-2016-08-23-213704.jpgscreenshot-2016-08-23-213827.jpg
... View more
Labels:
- Labels:
-
Apache Zeppelin
08-09-2016
07:35 AM
HI Pierre, We would need to look at the code. Can you a do a persist just before stage 63 and before stage 65 check the spark UI storage tab and executor tab for data skew. If there is data skew, you will need to add a salt key to your key. You could also look at creating a dataframe from the RDD rdd.toDF() and apply UDF on it. DF manage memory more efficiently. Best, Amit
... View more
08-08-2016
09:13 AM
Spark 2.0.0 will be available with HDP-2.5 https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.html.
... View more
08-08-2016
09:03 AM
Hi Pierre, How is Object defined and serialized? If fields of your object refers to the RDD, it copies the full RDD and shuffles it. Would you be able to do a persist/cache before the broadcast join and get the Spark UI DAGS and Storage pages. Cheers. Amit
... View more