How to prepare data in structure format?


Cloudera Employee

The solution depends of if you wish to stream or do it in batch? And what the use case for the data is?

For a batch solution i would use Hive tables for the source data and hive for the target.

For streaming NiFi and hive for the target.

Regardless of streaming or batch i would use the same process:

First i would define a target data structure which represents the desired output structure

I would also define a mapping table which hold the mappings for each source structure to the target

Within the mapping i would also define what tasks need to be performed on each attribute e.g. remove strings etc

I would use Spark to execute the transformation and NiFi to control the process