Member since
01-11-2018
3
Posts
0
Kudos Received
0
Solutions
01-29-2018
09:43 AM
@vgarg Thanks for checking and the reply. I have opened JIRA to report this. https://issues.apache.org/jira/browse/HIVE-18563 I can use the workaround for this issue. Regards, Jun
... View more
01-11-2018
10:49 AM
After upgrading HDP from 2.3.2.0 to 2.6.2.0, "load data into table" behavior changed. The input data is hourly data. All file names is same name. /user/user1/logs/yyyymmdd/00/part-r-00000.gz
/user/user1/logs/yyyymmdd/01/part-r-00000.gz
/user/user1/logs/yyyymmdd/02/part-r-00000.gz
/user/user1/logs/yyyymmdd/03/part-r-00000.gz
・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・
/user/user1/logs/yyyymmdd/22/part-r-00000.gz
/user/user1/logs/yyyymmdd/23/part-r-00000.gz
Before upgrade (HDP 2.3.2.0 ) HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz
All files were renamed into part-r-0000_copy_*.gz without the file part-r-0000.gz. After upgrade(HDP 2.6.2.0 ) HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
There is only part-r-0000.gz. This file was the same file as part-r-0000_copy_23.gz. When files are loaded one by one, I can load all files like as HDP 2.3.2.0 environment. Why is the behavior different between 2.3.2.0 and 2.6.2.0 ? Thanks in advance OS : CentOS6 JDK : 1.8.0_152(Oracle) HDP : 2.3.2.0 and 2.6.2.0 Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205
... View more
Labels:
- Labels:
-
Apache Hive