Archives of Support Questions (Read Only)

nbalaji-elangov · ‎12-03-2015

Lets assume we have data in hive table for past 60 days. How to automatically move the data beyond a time period (30 days) to S3 and have only the latest 30 days data in hdfs. How to write a hive query to read the entire 60 days data ? How to point single hive table to multiple data storage - S3 and hdfs ?

Also is it possible to configure S3 as archival storage ?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/configuring_arch...

rgelhausen · ‎12-03-2015

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

View solution in original post

rgelhausen · ‎12-03-2015

See this doc on setting Falcon retention (and cloud replication/export/archival) policies.

I don't think a single table can use multiple storage setups. However, you can use a View to be the union of one table defined on local HDFS and a second table defined on S3.

sluangsay · ‎12-03-2015

To be able to use both S3 and HDFS for your Hive table, you could use an external table with partitions pointing to different locations.

Look for the process that starts at "An interesting benefit of this flexibility is that we can archive old data on inexpensive storage" in this link:

Hive def guide

To automate this process, you could use Cron but I guess Falcon should also be possible.

Cloudera Community

Archives of Support Questions (Read Only)

Single Hive table pointing to multiple storage- S3 and HDFS