Support Questions
Find answers, ask questions, and share your expertise

Should I copy data to hdfs before inserting it into hbase?

Should I copy data to hdfs before inserting it into hbase?

Contributor

I am trying to insert data into hbase using flink/spark but confused if I need to copy from local filesystem and ftp to hdfs first. What would be the proper flow if file size could be from 5MB to 31MB?

*Files are dumped in 5 min interval.

4 REPLIES 4

Re: Should I copy data to hdfs before inserting it into hbase?

The Data size is quite small. You have two options.

1. Create a managed hive table and write the data from Flink/spark to hive directly.
2. Save the data on HDFS and map a external hive table.

Option 1 will work fine for you.

Re: Should I copy data to hdfs before inserting it into hbase?

Contributor

I need to connect my application to web app and need data in miliseconds. Also, one file is 5-31MB. The files arrive at 5 min duration.

Re: Should I copy data to hdfs before inserting it into hbase?

Hey Eon.

Can you provide more details about the requirement, Do you need to push the data into Hive or pull from it.
Looks like more of a Hbase use case to me, can provide you more insights, if the requirement is more specific.

Re: Should I copy data to hdfs before inserting it into hbase?

Contributor

I am trying to put data from file to hbase.