Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Single Hive table pointing to multiple storage- S3 and HDFS

Rising Star

Lets assume we have data in hive table for past 60 days. How to automatically move the data beyond a time period (30 days) to S3 and have only the latest 30 days data in hdfs. How to write a hive query to read the entire 60 days data ? How to point single hive table to multiple data storage - S3 and hdfs ?

Also is it possible to configure S3 as archival storage ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/configuring_arch...

1 ACCEPTED SOLUTION

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

View solution in original post

2 REPLIES 2

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

Expert Contributor

To be able to use both S3 and HDFS for your Hive table, you could use an external table with partitions pointing to different locations.

Look for the process that starts at "An interesting benefit of this flexibility is that we can archive old data on inexpensive storage" in this link:

Hive def guide

To automate this process, you could use Cron but I guess Falcon should also be possible.