Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎10-22-2015

JSON Generic mapping using Spark/Flume?

I am looking for a solution, which can let me transform many external JSONs into my internal generic JSON and store it. I have worked with dozer mapper for mapping java beans when I was working in JAVA EE. Are there any solutions that would let me do something same for JSON? or something close?

Since, this will be run in a Hadoop against large data sets, I want to also consider the performance deviation that this kind of transformation/mapping solution will add?

Some more details on where I want to go..
External JSON Type 1

 

 

{
   "preview":false,
   "result":{
      "user_id":"1000000216",
      "service_name":"Sports Unlimited",
      "service_id":"74",
      "period_start":"2017-02-15 19:30:00",
      "period_end":"2017-02-15 20:00:00"
   }
}

External JSON Type 2

 

{
   "User":{
      "user_id":"1000000216",
      "name":"test"
      }
   "Service":{
      "service_name":"Sports Unlimited",
      "service_id":"74",
      "service_start":"2017-02-15 19:30:00",
      "service_end":"2017-02-15 20:00:00"
    }
}

All of these types, I want to map to lets say and internal type of following format:

-- Generic Common Internal JSON:
{
    "user_id":"1000000216",
    "service_name":"Sports Unlimited",
    "service_id":"74",
    "start":"2017-02-15 19:30:00",
    "end":"2017-02-15 20:00:00  "
}

My current data pipeline is built with Flume topology as ingestion and Spark as processing these jsons and Hive etcs. I want to make a unified transformation layer that can take care of these complex mappings and make the downstream processes independent of external types? Any suggestions will be appreciated? 

Announcements