Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

can someone convert the given dataset to required dataset using key value pair in spark

can someone convert the given dataset to required dataset using key value pair in spark

New Contributor

given dataset:

Row-Key-001, K1, 10, A2, 20, K3, 30, B4, 42, K5, 19, C20, 20
Row-Key-002, X1, 20, Y6, 10, Z15, 35, X16, 42 
Row-Key-003, L4, 30, M10, 5, N12, 38, O14, 41, P13, 8
required dataset:
Row-Key-001, K1
Row-Key-001, A2
Row-Key-001, K3
Row-Key-001, B4
Row-Key-001, K5
Row-Key-001, C20
Row-Key-002, X1
Row-Key-002, Y6
Row-Key-002, Z15
Row-Key-002, X16
Row-Key-003, L4
Row-Key-003, M10
Row-Key-003, N12
Row-Key-003, O14
Row-Key-003, P13
1 REPLY 1

Re: can someone convert the given dataset to required dataset using key value pair in spark

Super Collaborator

You can use a flatMap followed by a mapToPair.

See below

JavaRDD<String> flatMapRdd = fileRDD.flatMap(new FlatMapFunction<String, String>() {
List<String> dataList = new ArrayList<String>();
public Iterable<String> call(String line) throws Exception {
String key = line.split(",")[0];
line = line.replace(key+",", "").trim();
String[] splits = line.split(",");
for(int i = 0 ;i<splits.length;i+=2){
dataList.add(key+","+splits[i]);
}
return dataList;
}}
);

JavaPairRDD<String, String> kvRdd = flatMapRdd.mapToPair(new PairFunction<String, String, String>() {
public Tuple2<String, String> call(String kv) throws Exception {
return new Tuple2<String, String>(kv.split(",")[0], kv.split(",")[1]);
}});



Thanks
-ak-

Don't have an account?
Coming from Hortonworks? Activate your account here