Member since
01-22-2014
62
Posts
0
Kudos Received
0
Solutions
09-05-2018
03:33 AM
Thanks for this clarification. I also had the same qurery ragrding memory issue while loading data. Here you cleared doubt about file loading from HDFS. I have a similar question but the source is a local server or Cloud storage where the data size is more than driver memory ( let's say 1 GB in this case where the driver memory is 250 MB). If I fire command val file_rdd = sc.textFile("/path or local or S3") shoud Spark load the data or as you mentioned above will throgh exception? Also, is there a way to print driver available memroy in Terminal? Many Thanks, Siddharth Saraf
... View more
02-13-2017
08:10 PM
Thank you, you are right, when I create a kadmin user on each linux machine, you can successfully submit the task!
... View more
12-18-2014
12:54 PM
1 Kudo
It might be easier to just install the packages yourself. See Path B documentation here: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/installation_installation.html
... View more
11-19-2014
01:53 AM
It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.
... View more
09-15-2014
05:19 AM
Your signature is just a little bit off. The result of a join is not a triple, but a tuple whose second element is a tuple. You have: (_, (_, _),(_,_,device)) but I think you need: (_, ((_, _),(_,_,device)))
... View more
09-12-2014
06:49 AM
It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.
... View more
09-12-2014
06:23 AM
I believe it was added in 1.1, yes. I don't have a streaming app driver handy, so maybe double-check -- you will see an obvious Streaming tab if it's there. Without guaranteeing anything, I think the next CDH will have 1.1, and at any time you can run your own Spark jobs with any version under YARN.
... View more
09-10-2014
08:14 AM
I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need: import org.apache.spark.SparkContext._ In the shell this is imported by default.
... View more
08-13-2014
11:05 PM
Thanks for the solution.Will try the options available and give the feedback..
... View more
08-05-2014
03:42 AM
2 Kudos
Why? in a kerberized environment, to access resources you need to integrate with kerberos. The Spark project hasn't implemented anything like that. YARN works with kerberos, and so it can work with kerberos by leveraging YARN. Maybe part of the answer is, why is it necessary if it works through YARN?
... View more