- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive writing many small csv files to HDFS
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎07-03-2018 08:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am exporting Hive table data to csv files in HDFS using such queries
FROM Table T1 INSERT OVERWRITE DIRECTORY '<HDFS Directory>' SELECT *;
Hive is writing many small csv files(1-2MB) to the destination directory.
Is there a way to control the number of files or the size of csv files?
Note:
1) These csv files are not used for creating tables out of them so cannot replace the query with INSERT INTO TABLE...
2) Already tried these setting values to no avail
hive.merge.mapfiles=true; hive.merge.mapredfiles hive.merge.smallfiles.avgsize hive.merge.size.per.task mapred.max.split.size mapred.min.split.size;
TIA
I have many tables in Hive with varying size. Some are very large and some are small. I am fine if for large tables many files are generated till each file is larger than 16 MB. I don't want to explicitly set the number of mappers because that will hamper query performance for large tables.
Created ‎07-03-2018 08:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using Tez as execution engine, then you need to set below properties:
set hive.merge.tezfiles=true; set hive.merge.smallfiles.avgsize=128000000; set hive.merge.size.per.task=128000000;
Created ‎07-03-2018 08:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are using Tez as execution engine, then you need to set below properties:
set hive.merge.tezfiles=true; set hive.merge.smallfiles.avgsize=128000000; set hive.merge.size.per.task=128000000;
Created ‎07-03-2018 01:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That works, thank you.
