Reply
Highlighted
Explorer
Posts: 12
Registered: ‎06-18-2016

Spark Scala - Map a Wrapped Array as a Normal List

Hi experts,

Afther some scala programming, I'm getting this output:

[40146844020121125,WrappedArray(1726)]
[40148356620121118,WrappedArray(7205)]
[40148813920120703,WrappedArray(3504, 1703)]
[40148991920121112,WrappedArray(5616)]
[40150340320130324,WrappedArray(9909)]
[40150796920120926,WrappedArray(3509)]
[40151143320130423,WrappedArray(9909)]
[40153957220120426,WrappedArray(9909)]
[40154761720120504,WrappedArray(9909)]
[40154969620130124,WrappedArray(9909, 9909)]

But I want to extract this:

40146844020121125,1726
40148356620121118,7205
40148813920120703,3504, 1703
40148991920121112,5616
40150340320130324,9909
40150796920120926,3509
40151143320130423,9909
40153957220120426,9909
40154761720120504,9909
40154969620130124,9909,9909

I'm trying to analyze the frequent products purchase together and my Scala code is:

val data = sc.textFile("FILE");
case class Transactions(Transaction_ID:String,Dept:String,Category:String,Company:String,Brand:String,Product_Size:String,Product_Measure:String,Purchase_Quantity:String,Purchase_Amount:String);
def csvToMyClass(line: String) = {
val split = line.split(',');
Transactions(split(0),split(1),split(2),split(3),split(4),split(5),split(6),split(7),split(8))
}
val df = data.map(csvToMyClass).toDF("Transaction_ID","Dept","Category","Company","Brand","Product_Size","Product_Measure","Purchase_Quantity","Purchase_Amount");
df.show;
val df2 = df.groupBy("Transaction_ID").agg(collect_list($"Category"))
df.groupBy("Transaction_ID").agg(collect_list($"Category")).show

How can map the DataFrame to a normal list?

Many thanks!!!