we have a standard analytics flow, i am trying to explore can i use NIFI instead to make it easy. current layers are as follows
1. rdbms->base layer(hdfs)
2. base(hdfs)->intermediate layer(run hive queries and create another table)
3. intermediate layer - > R MODEL LAYER(run R scripts on intermediate tables)
4. RMODEL Layer -> reporting layer(again some hive queries are run on this)
i want to know if i can use Nifi to execute the above flow and schedule it even.
Nifi can do all of that, schedule it or have it run real-time.
If you search under articles you will see most of those.
Are they stand alone R scripts? if so you can run them via executeprocess
If they are spark-r you can run them via execute spark or via kafka.
In R Studio you can make hive queries.
NiFi can read rdbms tables and write to hdfs
Nifi can absolutely do this, but you may want to look at skipping the hdfs layer and going directly to a hive managed orc table from the rdbms step:
rdbms --(via data in Nifi flow file)--> orc table --> R Model Layer -> model output hive tables?
the reporting layer should probably independent and access the hive layer on it's own?