Support Questions
Find answers, ask questions, and share your expertise

JSON Generic mapping using Spark/Flume?

New Contributor

I am looking for a solution, which can let me transform many external JSONs into my internal generic JSON and store it. I have worked with dozer mapper for mapping java beans when I was working in JAVA EE. Are there any solutions that would let me do something same for JSON? or something close?

Since, this will be run in a Hadoop against large data sets, I want to also consider the performance deviation that this kind of transformation/mapping solution will add? Some more details on where I want to go.. External JSON Type 1

{
   "preview":false,
   "result":{
      "user_id":"1000000216",
      "service_name":"Sports Unlimited",
      "service_id":"74",
      "period_start":"2017-02-15 19:30:00",
      "period_end":"2017-02-15 20:00:00"
   }
}

External JSON Type 2

{
   "User":{
      "user_id":"1000000216",
      "name":"test"
      }
   "Service":{
      "service_name":"Sports Unlimited",
      "service_id":"74",
      "service_start":"2017-02-15 19:30:00",
      "service_end":"2017-02-15 20:00:00"
    }
}

All of these types, I want to map to lets say and internal type of following format:

-- Generic Common Internal JSON:
{
    "user_id":"1000000216",
    "service_name":"Sports Unlimited",
    "service_id":"74",
    "start":"2017-02-15 19:30:00",
    "end":"2017-02-15 20:00:00  "
}

My current data pipeline is built with Flume topology as ingestion and Spark as processing these jsons and Hive etcs. I want to make a unified transformation layer that can take care of these complex mappings and make the downstream processes independent of external types? Any suggestions will be appreciated?

3 REPLIES 3

Contributor

This would be a good candidate for Apache NiFi. During ingestion, you could parse your data in the json format above using the processor EvaluateJSONPath processor:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPa...

You could write it in your standardized format using the AttributestoJSON processor:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.AttributesToJS...

To get started, this tutorial shows at least the first operator in action when parsing the twitter json structure:

https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.h...

New Contributor

Thanks zhoussen, I will get back to you with my experience. Thanks. Anyways, are you familiar with the performance degradation if any with the solution that you have suggested?

Contributor

Apache NiFi would essentially replace Flume for both the ingestion and the transformation. Performance-wise, it is scalable and shouldn't have any issues to accommodate high throughput requirements. What is the velocity you are looking at?