Member since
02-17-2017
71
Posts
17
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4499 | 03-02-2017 04:19 PM | |
32395 | 02-20-2017 10:44 PM | |
19065 | 01-10-2017 06:51 PM |
03-13-2017
05:35 PM
this is just a suggestion but have you tried running on Hive on Tez? Its a much faster and efficient execution engine. Try this before you execute your code. set hive.execution.engine=tez;
... View more
03-09-2017
05:26 PM
hi @Evan Willett The official Spark Documentation says this: The only reason Kryo is not the default is because of the custom
registration requirement, but we recommend trying it in any
network-intensive application.
Since Spark 2.0.0, we internally use Kryo serializer when shuffling RDDs
with simple types, arrays of simple types, or string type Link: http://spark.apache.org/docs/latest/tuning.html#data-serialization
... View more
03-09-2017
04:28 PM
Wow. ORC got me from going 3TB(PigStorage) to 60 gb. This is insane. I didn't notice any performance improvement though. But I am happy with savings in storage. Thanks! 🙂
... View more
03-08-2017
06:11 PM
what is the error you are getting while trying to use it then? This is what I used in Spark 1.6.1 import org.apache.spark.sql.functions.broadcast
val joined_df = df1.join(broadcast(df2), "key")
... View more
03-08-2017
05:42 PM
2 Kudos
Hi @X Long The official documentation does include it http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables Here is one tutorial using spark 2 https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-broadcast.html
... View more
03-07-2017
06:25 PM
oh! that worked. Thanks a lot!
... View more
03-07-2017
04:51 PM
I am trying to run some spark streaming examples online. But even before I start, I'm getting this error Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82) I tried this below but doesn't help. conf.set("spark.driver.allowMultipleContexts","true"); Sample code I was trying to run in HDP 2.5 import org.apache.spark._
import org.apache.spark.streaming._
val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(1))
... View more
Labels:
- Labels:
-
Apache Spark
03-06-2017
09:49 PM
@vamsi valiveti The result of the code you wrote gives the schema tike this
((a1),(a1of1)),(a2),(a3)
Now your projection wouldn't work in a data schema like this as Pig still considers the first two rows which is
"((a1),(a1of1))" as one. You need to use flatten for this case to make it into two separate columns. Thats exactly what my code is doing. I tested your data using my code. works perfectly.
... View more
03-06-2017
05:46 PM
Try this val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
... View more
03-03-2017
04:34 PM
I need some advise on getting myself equipped with Kafka and Spark Streaming skill set. Tutorials with best practices are welcome! Thanks
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Kafka
-
Apache Spark