Member since
06-17-2018
6
Posts
0
Kudos Received
0
Solutions
10-04-2018
06:58 AM
thanks @Geoffrey Shelton Okot for the reply! but these are already installed . I am using HDP 2.5 hortonworks sandbox.
... View more
10-04-2018
06:58 AM
I get the following error when I create SparkContext in standalone mode using scala class val sparkConfig = new SparkConf()
.setAppName("test")
.setMaster("local")
.set("hive.metastore.uris", "thrift://sandbox.hortonworks.com:9083")
val spark = SparkSession.builder()
.config(sparkConfig)
.enableHiveSupport()
.getOrCreate()
val model = PipelineModel.load("snappy model path from hdfs")
18/10/03 17:59:28 ERROR SnappyCompressor: failed to load SnappyCompressor
java.lang.NoSuchFieldError: clazz
at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method)
at org.apache.hadoop.io.compress.snappy.SnappyCompressor.<clinit>(SnappyCompressor.java:57)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:71)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:195)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$anon$1.liftedTree1$1(HadoopRDD.scala:252)
at org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:251)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/10/03 17:59:28 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded.
at
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Sqoop
09-15-2018
08:53 PM
well my bad, it turned out to be a connection issue as per the following log from executor. WARN DFSClient: Failed to connect to sandbox.hortonworks.com/<<IP>>:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused Here is what I did. on HDP 2.5 , using root login I modified start_scripts/start_sandbox.sh and forwarded the port 50010. and run the following commands docker commit sandbox sandbox<br> docker stop sandbox
docker rm sandbox
init 6 ( to restart ) now the spark master from Dev (A) can get the block from Dev (B) which is my HDP 2.5 machine.
... View more
06-17-2018
06:27 PM
I am trying to run the spark job with hive support enabled. it can run the command "show databases" successfully but when it try to read hive table ( which had data stored as txt on hdfs ) is showing org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:.... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 3.0 (TID 6, 192.168.8.134, executor 0): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: <bloac ID >> =<<path>>
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:984)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) Here are the details of my dev environment: Dev box (A) (centOS running in vmware ) with eclipse added jars from spark 2.2.1 with hadoop 2.7 support. Dev box (A) is running with spark master and slave configured with thrift server on Dev box (B) . Dev box (B) is running HDP 2.5 hortonworks. So, why the app running in dev box (A) is throwing the missing block exception when it try to query hive table , even if the file is present in hdfs ? Please note I have already executed the following command to check for blocks. sudo -u hdfs hdfs dfsadmin -report sudo -u hdfs hdfs fsck -list-corruptfileblocks Thanks for any help!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
06-17-2018
09:42 AM
any update on this issue? i’m facing the same problem.
... View more
09-21-2017
11:21 PM
I have python 2.6 and PIP on sanbox 2.5 and then I am trying to install awscli ( install awscli ) , but I am getting attached error. If awscli is using PyYAML3.12 and if PyYAML3.12 is not supported in Python2.6 then what could be the solution. How can I install awscli in hortonworks sandbox 2.5 ? appreciate your help!!
... View more