Member since
07-03-2016
7
Posts
0
Kudos Received
0
Solutions
08-17-2018
02:31 PM
@Naresh P R : thanks Naresh. Can you show me how to add path filter to skip files > maxSplitSize
... View more
08-17-2018
03:54 AM
@vgarg / @PJ : I am using HDP 2.6 but still I am facing the same issue.
... View more
08-17-2018
03:42 AM
Hi, I am trying to concatenate the small files in the hive partitions. But I found a strange behavior while I am doing so. I have many files under yyyy=2018, mm=7, dd=11 partition. When I tried to run the below query: alter table dbname.tblName partition (yyyy=2018, mm=7, dd=11) concatenate; all the small files got concatenated into 2 big files. I want to see if I can able to concatenate further to make it as single file. Strangely, It didn't convert the 2 files into 1 file in the first run. After running the same query for 4 times, it converted into 1 single big file. I didn't understand this behavior. alter table dbname.tblName partition (yyyy=2018, mm=7, dd=11) concatenate;
INFO : Session is already open
INFO : Dag name: hive_
INFO : Status: Running (Executing on YARN cluster with App id AppID)
INFO : Loading data to table dbname.tblName partition (yyyy=2018, mm=7, dd=11) from /apps/hive/warehouse/dbname.db/tblName/yyyy=2018/mm=7/dd=11/.hive-staging_hive_2018-08-16_21-27-52_556_544797697765237034-149145/-ext-10000
INFO : Partition dbname.tblName{yyyy=2018, mm=7, dd=11} stats: [numFiles=2, numRows=74319, totalSize=1629690, rawDataSize=80710514]
No rows affected (5.008 seconds)
alter table dbname.tblName partition (yyyy=2018, mm=7, dd=11) concatenate;
INFO : Session is already open
INFO : Dag name: hive_
INFO : Status: Running (Executing on YARN cluster with App id AppID)
INFO : Loading data to table dbname.tblName partition (yyyy=2018, mm=7, dd=11) from /apps/hive/warehouse/dbname.db/tblName/yyyy=2018/mm=7/dd=11/.hive-staging_hive_2018-08-16_21-27-58_733_1348505315688040528-149145/-ext-10000
INFO : Partition dbname.tblName{yyyy=2018, mm=7, dd=11} stats: [numFiles=2, numRows=74319, totalSize=1629690, rawDataSize=80710514]
No rows affected (1.289 seconds)
alter table dbname.tblName partition (yyyy=2018, mm=7, dd=11) concatenate;
INFO : Session is already open
INFO : Dag name: hive_
INFO : Status: Running (Executing on YARN cluster with App id AppID)
INFO : Loading data to table dbname.tblName partition (yyyy=2018, mm=7, dd=11) from /apps/hive/warehouse/dbname.db/tblName/yyyy=2018/mm=7/dd=11/.hive-staging_hive_2018-08-16_21-28-01_294_168641035365555493-149145/-ext-10000
INFO : Partition dbname.tblName{yyyy=2018, mm=7, dd=11} stats: [numFiles=2, numRows=74319, totalSize=1629690, rawDataSize=80710514]
No rows affected (2.368 seconds)
alter table dbname.tblName partition (yyyy=2018, mm=7, dd=11) concatenate;
INFO : Session is already open
INFO : Dag name: hive_
INFO : Status: Running (Executing on YARN cluster with App id AppID)
INFO : Loading data to table dbname.tblName partition (yyyy=2018, mm=7, dd=11) from /apps/hive/warehouse/dbname.db/tblName/yyyy=2018/mm=7/dd=11/.hive-staging_hive_2018-08-16_21-28-04_876_2200942119932282933-149145/-ext-10000
INFO : Partition dbname.tblName{yyyy=2018, mm=7, dd=11} stats: [numFiles=1, numRows=74319, totalSize=1628545, rawDataSize=80710514]
No rows affected (0.877 seconds)
Can anyone throw some light into this?
... View more
Labels:
- Labels:
-
Apache Hive
12-31-2016
03:25 AM
@Binu Mathew : Thanks for sharing the awesome article. Do you mind to share the sample data?
... View more