I am trying to insert data into hbase using flink/spark but confused if I need to copy from local filesystem and ftp to hdfs first. What would be the proper flow if file size could be from 5MB to 31MB?
*Files are dumped in 5 min interval.
The Data size is quite small. You have two options.
1. Create a managed hive table and write the data from Flink/spark to hive directly.
2. Save the data on HDFS and map a external hive table.
Option 1 will work fine for you.
I need to connect my application to web app and need data in miliseconds. Also, one file is 5-31MB. The files arrive at 5 min duration.
Can you provide more details about the requirement, Do you need to push the data into Hive or pull from it.
Looks like more of a Hbase use case to me, can provide you more insights, if the requirement is more specific.