Member since
08-13-2015
9
Posts
0
Kudos Received
0
Solutions
09-22-2015
05:54 AM
I did not modify classpath.txt since I don't want to touch files generated by cloudera. Without the userClassPathFirst switch, it works correctly. Unfortunately, I need the switch to replace a few other libraries that are already in the classpath.txt.
... View more
09-22-2015
05:38 AM
I've just tried a simple application with nothing but a Main class in the jar file. This is the code: import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.spark._
import org.apache.spark.rdd.NewHadoopRDD
object Main {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf())
val conf = HBaseConfiguration.create()
conf.set(HConstants.ZOOKEEPER_QUORUM, "my.server")
//config.set("hbase.zookeeper.property.clientPort","2181");
val explicitNamespace: String = "BankData"
val qualifiedTableName: String = explicitNamespace + ':' + "bank"
conf.set(TableInputFormat.INPUT_TABLE, qualifiedTableName)
conf.set(TableOutputFormat.OUTPUT_TABLE, qualifiedTableName)
conf.set(TableInputFormat.INPUT_TABLE, "tmp")
var hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result])
}
} I run this with: spark-submit --conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true" --class Main --master yarn-cluster test2.jar I got the following exception: 15/09/22 14:34:28 ERROR yarn.ApplicationMaster: User class threw exception: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:157)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199)
at scala.Option.map(Option.scala:145)
at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:199)
at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:101)
at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)
at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77)
at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:878)
at Main$.main(Main.scala:29)
at Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480) I'm running CDH 5.4.3
... View more
09-22-2015
04:23 AM
We have already tried setting the 2 "userClassPathFirst" switches, but unfortunately, we ended up with same strange exception: ERROR yarn.ApplicationMaster: User class threw exception: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method) at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:157) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199) at scala.Option.map(Option.scala:145) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:199) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:101) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) at org.apache.spark.rdd.NewHadoopRDD.<init>(NewHadoopRDD.scala:77) at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:878) at org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:516) ... What I don't understand is why there is SPARK_DIST_CLASSPATH being set at all. Vanilla spark installation has everything in the single jar and any additional dependency must be specified explicitely, correct? Would it be possible to completely replace the classpath.txt with user provided dependencies? Thanks!
... View more
09-22-2015
12:36 AM
Hello, I would like to use newer version of some of the libraries listed in /etc/spark/conf/classpath.txt. What is the recommended way to do that? I add other libraries using spark-submit's --jars (I have the jars on HDFS), but this does not work with newer versions of libraries that are already in classpath.txt. Alternatively, is there a way to disable construction of classpath.txt and rely solely on libraries provided to the spark-submit (except spark and hadoop possibly)? I'm running spark on yarn (cluster mode). Thank you!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
-
HDFS
08-19-2015
12:41 AM
... or at least I would file an issue, but I could not find any issue tracker or submission form on cloudera sites.
... View more
08-19-2015
12:36 AM
We are using CDH 5.4.3 and what you described seems to work only when creating the cluster (though I have to verify this). When adding the Spark Gateway role to another node, this does not seem to work. Only the original Spark Gateway works. That is, unless the role is taken away from it. After that, I found no way how to restore the configuration and get at least one functional node with spark/yarn configuration. Anyway, this seems like a bug in CM, so I guess I should file an issue.
... View more
08-13-2015
02:14 AM
Hello, I'm trying to understand how the spark-submit is configured on the cloudera clusters. I'm using parcel-based installation and Spark on Yarn service. On one machine (with spark Gateway role), where the spark-submit works, I see the following configuration (/etc/spark/conf/spark-env.sh): #!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##
SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
export SPARK_CONF_DIR="$SELF"
fi
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/spark
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.4.3-1.cdh5.4.3.p0.6/lib/hadoop
### Path of Spark assembly jar in HDFS
export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''}
export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}
if [ -n "$HADOOP_HOME" ]; then
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
SPARK_EXTRA_LIB_PATH=""
if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH
fi
export LD_LIBRARY_PATH
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
# This is needed to support old CDH versions that use a forked version
# of compute-classpath.sh.
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
# Set distribution classpath. This is only used in CDH 5.3 and later.
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")
/etc/spark/conf/spark-env.sh (END)
I also noticed that some nodes have spark-submit, that just throws "java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream" exception. Here's a content of the spark-env.sh on that nodes: ###
### === IMPORTANT ===
### Change the following to specify a real cluster's Master host
###
export STANDALONE_SPARK_MASTER_HOST=`hostname`
export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
### Let's run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_WEBUI_PORT=18080
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_PORT=7078
export SPARK_WORKER_WEBUI_PORT=18081
export SPARK_WORKER_DIR=/var/run/spark/work
export SPARK_LOG_DIR=/var/log/spark
export SPARK_PID_DIR='/var/run/spark/'
if [ -n "$HADOOP_HOME" ]; then
export LD_LIBRARY_PATH=:/usr/lib/hadoop/lib/native
fi
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
if [[ -d $SPARK_HOME/python ]]
then
for i in
do
SPARK_DIST_CLASSPATH=${SPARK_DIST_CLASSPATH}:$i
done
fi
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:$SPARK_LIBRARY_PATH/spark-assembly.jar"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-hdfs/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-hdfs/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-mapreduce/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-mapreduce/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-yarn/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hadoop-yarn/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/hive/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/flume-ng/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/paquet/lib/*"
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/lib/avro/lib/*"
However, I'm not able to configure another node to have the functional spark-submit. Adding a "Spark Gateway" role to another node does not work. The command is not added to the machine nor the configuration is fixed for the broken ones. I also tried to remove the Spark service and add it back, after that, all nodes have the broken configuration. Thanks for any help!
... View more
Labels:
- Labels:
-
Apache Spark