Member since
05-17-2016
190
Posts
46
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1364 | 09-07-2017 06:24 PM | |
1763 | 02-24-2017 06:33 AM | |
2527 | 02-10-2017 09:18 PM | |
7025 | 01-11-2017 08:55 PM | |
4602 | 12-15-2016 06:16 PM |
06-08-2016
01:53 PM
Thanks @clukasik. Is there any performance difference in choosing client deploy-mode over the cluster mode.If I use the default client deploy mode, I get the control on where my driver program runs. However, wanted to be sure that it does not cause any performance issue.
... View more
06-08-2016
01:36 PM
@Rajkumar Singh : Yes, but here the file resides on the machine where we trigger the spark-submit. So I was looking if there is any way to read it in the driver without actually having to move it to all the workers or even to the HDFS.
... View more
06-08-2016
01:33 PM
Thanks for the suggestion @Jitendra Yadav
But, the file being small <~ 500 KB, I was thinking if we need to have that loaded to HDFS. Was looking for some "hack"
... View more
06-08-2016
01:26 PM
Hi, One of the spark application depends on a local file for some of its business logics. We can read the file by referring to it as file:///. But for this to work, the copy of the file needs to be on every worker or every worker need to have access to common shared drive as in a NFS mount. Is there any other way of achieving this?
... View more
Labels:
- Labels:
-
Apache Spark
06-06-2016
02:28 PM
Thanks @clukasik. Got it!!
... View more
06-06-2016
02:16 PM
Thanks @clukasik. That solves the problem. I was going an unwanted circle to address this.
++ on the second part of the question, does it make any sense in parallelizing a list before actually storing it to a file? As in the last 2 lines of my code.
... View more
06-06-2016
01:21 PM
Hi All, Need recommendation on the best approach for solving the below problem. I have included the code snippet that I have done. I read a hdfs file using a custom input format and in turn get a PairRdd. Now I am interested in operating on the value one at a time and I am not bothered of the key. Is a java list a scalable data structure to hold the values? Please have a look at the code below and suggest alternates. Also does the parallelize at the end of code give any benefit? JavaPairRDD<LongWritable, BytesWritable> fixedFileRdd = getItSomeHow();
List<String>zeroValue = new ArrayList<String>();
Function2<List<String>, Tuple2<LongWritable, BytesWritable>, List<String>> seqOp = new Function2<List<String>, Tuple2<LongWritable,BytesWritable>, List<String>>() {
public List<String> call(List<String> valueList, Tuple2<LongWritable, BytesWritable> eachKeyValue) throws Exception {
valueList.add(doWhatever(new String(eachKeyValue._2.copyBytes())));
returnvalueList;
}
private String doWhatever(String string) {
// will be an external utility method call, this is for representational purpose only
return System.currentTimeMillis()+"-"+string;
}
};
Function2<List<String>, List<String>, List<String>> combOp = new Function2<List<String>, List<String>, List<String>>() {
public List<String> call(List<String> listOne, List<String> listTwo) throws Exception {
listOne.addAll(listTwo);
return listOne;
}
};
List<String> resultantList = fixedFileRdd.aggregate(zeroValue, seqOp , combOp );
JavaRDD<String> resultantRdd = jsc.parallelize(resultantList);
resultantRdd.saveAsTextFile("out-dir");
... View more
Labels:
- Labels:
-
Apache Spark
06-01-2016
05:19 AM
Got it, added an action first() to make it forcefully trigger. And yes, the reason that you mentioned "spark actions are lazily evaluted" was what stopped me.
... View more
- « Previous
- Next »