Member since
01-11-2018
3
Posts
0
Kudos Received
0
Solutions
02-26-2018
08:27 AM
Our Hiveserver2 environment forbids users to use the "add" command for security purposes. For performance, I would like to upload a list file containing aggregated data to, and read it from the distributed cache. However, I don't know of any way to upload the list file to, and read it from the distributed cache without using the add command. There is any way to do this, please let me know. It's also okay to use Hive UDF. Thanks in advance OS : CentOS7 JDK : 1.8.0_66(Oracle) HDP : 2.3.4 Hive : 1.2.1
... View more
Labels:
01-29-2018
09:43 AM
@vgarg Thanks for checking and the reply. I have opened JIRA to report this. https://issues.apache.org/jira/browse/HIVE-18563 I can use the workaround for this issue. Regards, Jun
... View more
01-11-2018
10:49 AM
After upgrading HDP from 2.3.2.0 to 2.6.2.0, "load data into table" behavior changed. The input data is hourly data. All file names is same name. /user/user1/logs/yyyymmdd/00/part-r-00000.gz
/user/user1/logs/yyyymmdd/01/part-r-00000.gz
/user/user1/logs/yyyymmdd/02/part-r-00000.gz
/user/user1/logs/yyyymmdd/03/part-r-00000.gz
・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・
/user/user1/logs/yyyymmdd/22/part-r-00000.gz
/user/user1/logs/yyyymmdd/23/part-r-00000.gz
Before upgrade (HDP 2.3.2.0 ) HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_1.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_10.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_11.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_12.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_13.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_14.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_15.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_16.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_17.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_18.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_19.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_2.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_20.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_21.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_22.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_23.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_3.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_4.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_5.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_6.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_7.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_8.gz
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000_copy_9.gz
All files were renamed into part-r-0000_copy_*.gz without the file part-r-0000.gz. After upgrade(HDP 2.6.2.0 ) HQL
hive> load data inpath '/user/user1/logs/yyyymmdd/*/*.gz' into table sample_db.sample_tbl partition (dt='yyyymmdd');
Result
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd
/hive/warehouse/sample_db.db/sample_tbl/dt=yyyymmdd/part-r-00000.gz
There is only part-r-0000.gz. This file was the same file as part-r-0000_copy_23.gz. When files are loaded one by one, I can load all files like as HDP 2.3.2.0 environment. Why is the behavior different between 2.3.2.0 and 2.6.2.0 ? Thanks in advance OS : CentOS6 JDK : 1.8.0_152(Oracle) HDP : 2.3.2.0 and 2.6.2.0 Hive : 1.2.1.2.3.2.0-2950 and 1.2.1000.2.6.2.0-205
... View more
Labels: