Support Questions

Find answers, ask questions, and share your expertise

Single Hive table pointing to multiple storage- S3 and HDFS

avatar
Expert Contributor

Lets assume we have data in hive table for past 60 days. How to automatically move the data beyond a time period (30 days) to S3 and have only the latest 30 days data in hdfs. How to write a hive query to read the entire 60 days data ? How to point single hive table to multiple data storage - S3 and hdfs ?

Also is it possible to configure S3 as archival storage ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/configuring_arch...

1 ACCEPTED SOLUTION

avatar

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

View solution in original post

2 REPLIES 2

avatar

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

avatar
Super Collaborator

To be able to use both S3 and HDFS for your Hive table, you could use an external table with partitions pointing to different locations.

Look for the process that starts at "An interesting benefit of this flexibility is that we can archive old data on inexpensive storage" in this link:

Hive def guide

To automate this process, you could use Cron but I guess Falcon should also be possible.