Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Should I copy data to hdfs before inserting it into hbase?

Should I copy data to hdfs before inserting it into hbase?

Contributor

I am trying to insert data into hbase using flink/spark but confused if I need to copy from local filesystem and ftp to hdfs first. What would be the proper flow if file size could be from 5MB to 31MB?

*Files are dumped in 5 min interval.

4 REPLIES 4
Highlighted

Re: Should I copy data to hdfs before inserting it into hbase?

The Data size is quite small. You have two options.

1. Create a managed hive table and write the data from Flink/spark to hive directly.
2. Save the data on HDFS and map a external hive table.

Option 1 will work fine for you.

Highlighted

Re: Should I copy data to hdfs before inserting it into hbase?

Contributor

I need to connect my application to web app and need data in miliseconds. Also, one file is 5-31MB. The files arrive at 5 min duration.

Highlighted

Re: Should I copy data to hdfs before inserting it into hbase?

Hey Eon.

Can you provide more details about the requirement, Do you need to push the data into Hive or pull from it.
Looks like more of a Hbase use case to me, can provide you more insights, if the requirement is more specific.

Highlighted

Re: Should I copy data to hdfs before inserting it into hbase?

Contributor

I am trying to put data from file to hbase.

Don't have an account?
Coming from Hortonworks? Activate your account here