Support Questions
Find answers, ask questions, and share your expertise

how to merge two columns of data frame into one while keeping the heading same.

how to merge two columns of data frame into one while keeping the heading same.

I have a json data file on which I need to perform map reduce jobs

content of file is something like this

{"key1":"a","key2":"b","key3":"c"}

{"key1":"a","key2":"d","key3":"w"}

{"key1":"h","key2":"b","key3":"t"}

{"key1":"i","key2":"g","key3":"e"}

yes values can be redundant for the keys.

so I needed to create a matrix where I need to list all the key1 which are associated with any number of key2 or key 3

like in first two rows value of key1 a is coming for b and d of key2 and c and w of key3

so I wrote following code using spark sql

  1. val rawData = spark.read.json(path to file)
  2. val matrix = rawData.select(rawData("key1").as("key1"),rawData("key2").as("key2"),rawData("key3").as("key3")).distinct().cache()
  3. import spark.implicits._
  4. val gorup1 = matrix.groupBy($"key1").agg(collect_set($"key2").as("key2"))
  5. val countOnKey1 = mcd1.groupBy($"key1").count().cache()
  6. val group2 = matrix.groupBy($"key1").agg(collect_set($"key3").as("key3"))
  7. val outPut1 = group1.join(group2,Seq("key1"),joinType ="outer")
  8. val opfinal = outPut1.join(countOnKey1,Seq("key1"),joinType ="outer").orderBy(desc("count")).cache()
  9. opfinal.coalesce(1).write.json(path to output file)

Now the above code gave me the desired output like this

{"key1": "val", "key2":["val1","val2"], "key3": ["val1","val2"], "count":2}

{"key1": "val", "key2":["val1"], "key3": ["val1], "count":1}

now I want to know is, is there anyway to combine the key2 and key3 under one key like this

{"key1": "val", "newKey": [{"key2":"val1", "key3":"val1"}, {"key2":"val2","key3": "val2"} ], "count":2}

i.e earlier they are two separate json array and I intend to merge them in One sine json array.

If anybody can suggest anything it will be really helpful and if there is any other better approach to counter this entire problem then please share.