Member since
07-05-2016
10
Posts
1
Kudos Received
0
Solutions
06-12-2018
02:13 PM
why is the spark shuffle stage is so slow for 1.6 MB shuffle write, and 2.4 MB input?.Also why is the shuffle write happening only on one executor ?.I am running a 3 node cluster with 8 cores each. Please see my code and Spark UI pictures below Code: <code>JavaPairRDD<String, String> javaPairRDD = c.mapToPair(new PairFunction<String, String, String>() {
@Override
public Tuple2<String, String> call(String arg0) throws Exception {
// TODO Auto-generated method stub
try {
if (org.apache.commons.lang.StringUtils.isEmpty(arg0)) {
return new Tuple2<String, String>("", "");
}
Tuple2<String, String> t = new Tuple2<String, String>(getESIndexName(arg0), arg0);
return t;
} catch (Exception e) {
e.printStackTrace();
System.out.println("******* exception in getESIndexName");
}
return new Tuple2<String, String>("", "");
}
});
java.util.Map<String, Iterable<String>> map1 = javaPairRDD.groupByKey().collectAsMap();*
... View more
Labels:
- Labels:
-
Apache Spark
09-21-2016
08:24 PM
thanks David, planning to create one template and done, as I see no advantage really of 50 templates
... View more
09-21-2016
07:54 PM
1 Kudo
Hi there, I have a best practice question, I have been developing few flows on our test nifi box.I have about 50 data sources, and hence I have created 50 process groups, so that I can enable /disable independent of others at any time, is this a good idea ?, and when we are production ready, I would like to create template, and export them to the production box.Please advice.
Thanks
Pradeep
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
07-05-2016
05:22 PM
Thanks Neeraj, saw the link before, increased memory like they said, but still see errors like above.And also they dont answer about data loss, reprocessing the data etc .
... View more
07-05-2016
04:54 PM
hi there, I see these errors in my spark streaming application.Dont see anymore logs, I have enabled write ahead logs, and check pointing.So when I have the below errors, I loose an executor and an other comes right back up, and the streaming application does not stop, so what happened to the data ?.I hope I did not loose the data, and I am assuming the data was re processed some how.Please advice. 16/06/30 16:38:23 ERROR cluster.YarnScheduler: Lost executor 9 on sxn19.dcc.localdomain: remote Rpc client disassociated Thanks Pradeep
... View more
Labels:
- Labels:
-
Apache Spark