Created on 05-15-2019 04:03 AM - edited 09-16-2022 07:23 AM
Overview
With a destination table using the ABFS(S) (Azure Data Lake Storage Gen2) as its location, a CTAS (create table as select) operation fails with the following error (e.g.) on CDH6.2.0 (Impala 3.2.0):
ERROR: Failed to open HDFS file for writing: abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/dest_tab/_impala_insert_staging/244c5ee8ece6f759_8b1a1e3b00000000/.244c5ee8ece6f759-8b1a1e3b00000000_1926230402_dir/244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0. Error(22): Invalid argument Root cause: IllegalArgumentException: ABFS does not allow files or directories to end with a dot.
This operation works on CDH6.1.1 (Impala 3.1.0). The error is seemingly caused by the inclusion of https://issues.apache.org/jira/browse/HADOOP-15860 in CDH6.2.0.
See test case below to reproduce.
Q1: Are there any workarounds available for this? E.g. is there a way to control the format of the filenames in the staging area created by Impala?
Q2: Are you aware of this issue and will you be fixing this in a future release? If so, can you give me the version number of CDH you will include the fix please?
Test Case
CDH6.2.0
Server version: impalad version 3.2.0-cdh6.2.0 RELEASE (build edc19942b4debdbfd485fbd26098eef435003f5d)
0. Setup
CREATE DATABASE IF NOT EXISTS test_db LOCATION 'abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir'; use test_db; CREATE TABLE source_tab ( cust_id BIGINT, cust_first_name STRING, cust_last_name STRING ) PARTITIONED BY ( cust_gender STRING ) STORED AS PARQUET; show create table source_tab; +-------------------------------------------------------------------------------------------+ | result | +-------------------------------------------------------------------------------------------+ | CREATE TABLE test_db.source_tab ( | | cust_id BIGINT, | | cust_first_name STRING, | | cust_last_name STRING | | ) | | PARTITIONED BY ( | | cust_gender STRING | | ) | | STORED AS PARQUET | | LOCATION 'abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/source_tab' | | | +-------------------------------------------------------------------------------------------+ INSERT INTO source_tab PARTITION (cust_gender='M') VALUES (1, 'Joe', 'Bloggs'); INSERT INTO source_tab PARTITION (cust_gender='M') VALUES (2, 'Fred', 'Bloggs'); INSERT INTO source_tab PARTITION (cust_gender='F') VALUES (3, 'Anne', 'Bloggs'); INSERT INTO source_tab PARTITION (cust_gender='F') VALUES (4, 'Izzy', 'Bloggs'); SELECT * FROM source_tab; +---------+-----------------+----------------+-------------+ | cust_id | cust_first_name | cust_last_name | cust_gender | +---------+-----------------+----------------+-------------+ | 1 | Joe | Bloggs | M | | 4 | Izzy | Bloggs | F | | 2 | Fred | Bloggs | M | | 3 | Anne | Bloggs | F | +---------+-----------------+----------------+-------------+
1. CTAS throws the error
CREATE TABLE dest_tab AS SELECT cust_id, cust_first_name, cust_last_name FROM source_tab; Query: CREATE TABLE dest_tab AS SELECT cust_id, cust_first_name, cust_last_name FROM source_tab ERROR: Failed to open HDFS file for writing: abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/dest_tab/_impala_insert_staging/244c5ee8ece6f759_8b1a1e3b00000000/.244c5ee8ece6f759-8b1a1e3b00000000_1926230402_dir/244c5ee8ece6f759-8b1a1e3b00000000_45513034_data.0. Error(22): Invalid argument Root cause: IllegalArgumentException: ABFS does not allow files or directories to end with a dot.
2. INSERT-SELECT works
DROP TABLE dest_tab PURGE; CREATE TABLE dest_tab ( cust_id BIGINT, cust_first_name STRING, cust_last_name STRING ) STORED AS PARQUET; show create table dest_tab; +-----------------------------------------------------------------------------------------+ | result | +-----------------------------------------------------------------------------------------+ | CREATE TABLE test_db.dest_tab ( | | cust_id BIGINT, | | cust_first_name STRING, | | cust_last_name STRING | | ) | | STORED AS PARQUET | | LOCATION 'abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/dest_tab' | | | +-----------------------------------------------------------------------------------------+
INSERT INTO dest_tab
SELECT cust_id,
cust_first_name,
cust_last_name
FROM source_tab;
SELECT * FROM dest_tab; +---------+-----------------+----------------+ | cust_id | cust_first_name | cust_last_name | +---------+-----------------+----------------+ | 1 | Joe | Bloggs | | 2 | Fred | Bloggs | | 3 | Anne | Bloggs | | 4 | Izzy | Bloggs | +---------+-----------------+----------------+
CDH6.1.1
Server version: impalad version 3.1.0-cdh6.1.1 RELEASE (build 97215ce79febfa42364dbff8e4c4d3c5bfc583ba)
0. Setup
As above.
1. CTAS works
CREATE TABLE dest_tab AS SELECT cust_id, cust_first_name, cust_last_name FROM source_tab; show create table dest_tab +-----------------------------------------------------------------------------------------+ | result | +-----------------------------------------------------------------------------------------+ | CREATE TABLE test_db.dest_tab ( | | cust_id BIGINT, | | cust_first_name STRING, | | cust_last_name STRING | | ) | | STORED AS TEXTFILE | | LOCATION 'abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/dest_tab' | | | +-----------------------------------------------------------------------------------------+ SELECT * FROM dest_tab; +---------+-----------------+----------------+ | cust_id | cust_first_name | cust_last_name | +---------+-----------------+----------------+ | 3 | Anne | Bloggs | | 2 | Fred | Bloggs | | 4 | Izzy | Bloggs | | 1 | Joe | Bloggs | +---------+-----------------+----------------+
2. INSERT-SELECT works
DROP TABLE dest_tab PURGE; CREATE TABLE dest_tab ( cust_id BIGINT, cust_first_name STRING, cust_last_name STRING ) STORED AS PARQUET; show create table dest_tab; +-----------------------------------------------------------------------------------------+ | result | +-----------------------------------------------------------------------------------------+ | CREATE TABLE test_db.dest_tab ( | | cust_id BIGINT, | | cust_first_name STRING, | | cust_last_name STRING | | ) | | STORED AS PARQUET | | LOCATION 'abfss://glgpv2store02cdh01@glgpv2store02.dfs.core.windows.net/mydir/dest_tab' | | | +-----------------------------------------------------------------------------------------+ INSERT INTO dest_tab SELECT cust_id, cust_first_name, cust_last_name FROM source_tab; SELECT * FROM dest_tab; +---------+-----------------+----------------+ | cust_id | cust_first_name | cust_last_name | +---------+-----------------+----------------+ | 1 | Joe | Bloggs | | 2 | Fred | Bloggs | | 4 | Izzy | Bloggs | | 3 | Anne | Bloggs | +---------+-----------------+----------------+
Created 05-15-2019 04:00 PM
Created 05-15-2019 04:13 PM
Created on 05-16-2019 01:08 AM - edited 05-16-2019 01:17 AM
Thanks for the reply and for raising the bug upstream.
We will track the bug to see when it is fixed and then check the release notes for future CDH versions to see if it is included.
I will retry my test case using parquet for the destination table.
Created 05-16-2019 02:48 AM
Created 05-16-2019 02:49 AM
No you are correct, I am wrong. The destination table is created as TEXT and not parquet. I've confirmed that this works in 6.2.0:
Query: CREATE TABLE dest_tab STORED AS PARQUET AS SELECT cust_id, cust_first_name, cust_last_name FROM source_tab +-------------------+ | summary | +-------------------+ | Inserted 4 row(s) | +-------------------+ Fetched 1 row(s) in 1.01s
Thanks.
Created 05-16-2019 04:11 AM
Created 11-06-2019 08:53 AM
IMPALA-8557 has been fixed. The fix should be coming in a CDH release soon.