The solution depends of if you wish to stream or do it in batch? And what the use case for the data is?
For a batch solution i would use Hive tables for the source data and hive for the target.
For streaming NiFi and hive for the target.
Regardless of streaming or batch i would use the same process:
First i would define a target data structure which represents the desired output structure
I would also define a mapping table which hold the mappings for each source structure to the target
Within the mapping i would also define what tasks need to be performed on each attribute e.g. remove strings etc
I would use Spark to execute the transformation and NiFi to control the process