Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Single Hive table pointing to multiple storage- S3 and HDFS

avatar
Expert Contributor

Lets assume we have data in hive table for past 60 days. How to automatically move the data beyond a time period (30 days) to S3 and have only the latest 30 days data in hdfs. How to write a hive query to read the entire 60 days data ? How to point single hive table to multiple data storage - S3 and hdfs ?

Also is it possible to configure S3 as archival storage ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/configuring_arch...

1 ACCEPTED SOLUTION

avatar

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

View solution in original post

2 REPLIES 2

avatar

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

avatar
Super Collaborator

To be able to use both S3 and HDFS for your Hive table, you could use an external table with partitions pointing to different locations.

Look for the process that starts at "An interesting benefit of this flexibility is that we can archive old data on inexpensive storage" in this link:

Hive def guide

To automate this process, you could use Cron but I guess Falcon should also be possible.