- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Creating Hive external table on specific files within folder
- Labels:
-
Apache Hive
Created ‎04-25-2017 02:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have some data being dropped into our HDFS file system on a daily basis into a single folder which contains multiple CSV files. Such as below;
/data/yyyy/mm/dd/file1.csv
/data/yyyy/mm/dd/file2.csv
Now I want to create a Hive external table on all the file1.csv files across all the folders under /data, now it doesn't seem it is currently possible to use a regex in the Hive external table command.
My next thought would be to copy the files into separate structures so Hive can parse this files individually, such as;
/data/file1/yyyy/mm/dd/file1.csv
/data/file2/yyyy/mm/dd/file2.csv
But I am not sure what the best way of doing this would be, whatever I choose to use would initially need to copy bulk data between this folder structures and then be able to be scheduled to copy files over on a daily basis when new folders are created.
Any help would be greatly appreciated, please let me know if any of the above is unclear.
Created ‎04-25-2017 05:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure about your use case. If you want just include file1 into hive table, you have to copy those files into separate folders. The alternative way might be you can including all data into the hive table, and let hive to control what data can be selected/seen etc.
Created ‎04-25-2017 05:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure about your use case. If you want just include file1 into hive table, you have to copy those files into separate folders. The alternative way might be you can including all data into the hive table, and let hive to control what data can be selected/seen etc.
Created ‎04-26-2017 10:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response Frank, I guess my question really was how to easily move these files into the correct folder structure without it being a manual process of using "hdfs dfs" commands.
The including all the data in the Hive table and then let hive control what can be selected/seen is an interesting concept, that might be a possible way of doing what we are after without having to adapt the underlying structure of the data in HDFS. We can then create views on top of this single hive table to split the data and then always insert into Hive internal tables if needed.
