Created 03-30-2017 08:13 PM
I am looking for a solution, which can let me transform many external JSONs into my internal generic JSON and store it. I have worked with dozer mapper for mapping java beans when I was working in JAVA EE. Are there any solutions that would let me do something same for JSON? or something close?
Since, this will be run in a Hadoop against large data sets, I want to also consider the performance deviation that this kind of transformation/mapping solution will add? Some more details on where I want to go.. External JSON Type 1
{ "preview":false, "result":{ "user_id":"1000000216", "service_name":"Sports Unlimited", "service_id":"74", "period_start":"2017-02-15 19:30:00", "period_end":"2017-02-15 20:00:00" } }
External JSON Type 2
{ "User":{ "user_id":"1000000216", "name":"test" } "Service":{ "service_name":"Sports Unlimited", "service_id":"74", "service_start":"2017-02-15 19:30:00", "service_end":"2017-02-15 20:00:00" } }
All of these types, I want to map to lets say and internal type of following format:
-- Generic Common Internal JSON: { "user_id":"1000000216", "service_name":"Sports Unlimited", "service_id":"74", "start":"2017-02-15 19:30:00", "end":"2017-02-15 20:00:00 " }
My current data pipeline is built with Flume topology as ingestion and Spark as processing these jsons and Hive etcs. I want to make a unified transformation layer that can take care of these complex mappings and make the downstream processes independent of external types? Any suggestions will be appreciated?
Created 04-03-2017 08:07 PM
This would be a good candidate for Apache NiFi. During ingestion, you could parse your data in the json format above using the processor EvaluateJSONPath processor:
You could write it in your standardized format using the AttributestoJSON processor:
To get started, this tutorial shows at least the first operator in action when parsing the twitter json structure:
Created 04-03-2017 11:09 PM
Thanks zhoussen, I will get back to you with my experience. Thanks. Anyways, are you familiar with the performance degradation if any with the solution that you have suggested?
Created 04-07-2017 03:00 PM
Apache NiFi would essentially replace Flume for both the ingestion and the transformation. Performance-wise, it is scalable and shouldn't have any issues to accommodate high throughput requirements. What is the velocity you are looking at?