Member since
10-07-2015
107
Posts
73
Kudos Received
23
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2540 | 02-23-2017 04:57 PM | |
1993 | 12-08-2016 09:55 AM | |
8894 | 11-24-2016 07:24 PM | |
3967 | 11-24-2016 02:17 PM | |
9344 | 11-24-2016 09:50 AM |
08-04-2016
05:00 PM
1 Kudo
Assume you have a ORC table "test" in hive that fits to the csv file "test.csv" SparkSQL sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", ",")
.load("/tmp/test.csv")
.insertInto("test")
... View more
08-04-2016
04:40 PM
Does the Ambari Server see all virtual machines on the other machine, e.g. are they in the same network and is the Ambari server machine able to resolve the hostnames of the other machine? If so can root from Ambari server machine log into the virtual machines on the other machine wíthout password? These are a few things that need to happen during registration
... View more
08-04-2016
04:31 PM
1 Kudo
Assume you have a file /tmp/test.csv" like
Col1|Col2|Col3|Col4
12|34|"56|78"|9A
"AB"|"CD"|EF|"GH:"|:"IJ"
If I load it with Spark I get val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true")
.option("delimiter", "|").option("escape", ":").load("/tmp/test.csv")
df.show()
+----+----+-----+-------+
|Col1|Col2| Col3| Col4|
+----+----+-----+-------+
| 12| 34|56|78| 9A|
| AB| CD| EF|GH"|"IJ|
+----+----+-----+-------+ So the example contains delimiter in quotes and escaped quotes. I use ":" to escape quotes, you can many other characters (don't use e.g. "#") Is this something you want to achieve?
... View more
07-19-2016
07:14 AM
Example from the Spark doc page (http://spark.apache.org/docs/latest/submitting-applications.html)
# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000 executor-memory is what you want to adapt
... View more
07-18-2016
09:27 AM
2 Kudos
Have you tried to avoid folders with empty files? As an idea, instead of using <DStream>
.saveAsTextFiles("/tmp/results/ts", "json"); (which creates folders with empty files if nothing gets streamed from the source), I tried <DStream>
.foreachRDD(rdd => {
try {
val f = rdd.first() // fails for empty RDDs
rdd.saveAsTextFile(s"/tmp/results/ts-${System.currentTimeMillis}.json")
} catch {
case e:Exception => println("empty rdd")
}
}) It seems to work for me. No Folders with empty files any more.
... View more
07-15-2016
11:46 AM
This might help: https://community.hortonworks.com/questions/30288/oozie-spark-action-on-hdp-24-nosuchmethoderror-org.html
... View more
07-15-2016
11:44 AM
It looks like you are executing the job as user hadoop, However spark wants to execute staging data from/user/yarn (which can only be accessed by yarn). How did you start the job and with which user? I am surprised that spark uses /user/yarn as staging dir for user hadoop. Is there any staging dir configuration in your system (SPARK_YARN_STAGING_DIR)?
... View more
07-14-2016
07:21 AM
1 Kudo
I don't know where the TFS bit comes from, maybe some dependency problems.
For including all dependencies in the workflow I would recommend to go for a fat jar (assembly). In scala with sbt you can see the idea here Creating fat jars with sbt. Same works with maven's "maven-assembly-plugin". You should be able to call your code as
spark-submit --master yarn-cluster \
--num-executors 2 --driver-memory 1g --executor-memory 2g --executor-cores 2 \
--class com.SparkSqlExample \
/home/hadoop/SparkParquetExample-0.0.1-SNAPSHOT-with-depencencies.jar
If this works, the jar with dependencies should be the one in the oozie spark action.
... View more
07-13-2016
04:20 PM
I installed it manually, it was quite straightforward. However you need maven 3.3, else some npm stuff will fail. I just did "mvn clean package -DskipTests" I then copied conf/zeppelin-env.sh.template to conf/zeppelin-env.sh and added export JAVA_HOME=/usr/jdk64/jdk1.8.0_60/
export SPARK_HOME=/usr/hdp/current/spark-client
export HADOOP_HOME=/usr/hdp/current/hadoop-client and copied zeppelin-site.xml.template to zeppelin-site.xml and changed port to 9995 Plus in Zeppelin for the Spark interpreter I changed the "master" property to yarn-client. Seems to work for me on a HDP 2.4.2 cluster
... View more