Created on 05-20-2015 09:48 AM - edited 09-16-2022 02:29 AM
Hi,
We are currently on Impala v2.2.0 and we are facing an issue when we try to insert a string value into an integer column. One of our data pipes is providing a string value in the src and the target column is an INT type. But the data gets inserted as NULL when we cast the column as INTEGER.
Is it supposed to behave in this manner?
Here are the steps for the same:
CREATE TABLE svuidh_prototype.src (
col1 INT,
col2 STRING
)
STORED AS TEXTFILE
LOCATION 'hdfs://nameservice1/user/hive/warehouse/svuidh_prototype.db/src'
TBLPROPERTIES ('transient_lastDdlTime'='1430844770');
[eplnx070:21000] > insert into src values (cast('ABC' as integer),'A');
Query: insert into src values (cast('ABC' as integer),'A')
Inserted 1 row(s) in 0.74s
[eplnx070:21000] > select * from src;
Query: select * from src
+------+------+
| col1 | col2 |
+------+------+
| NULL | A |
+------+------+
Fetched 1 row(s) in 0.57s
Thanks,
Ajay
Created 05-20-2015 04:49 PM
Hi Ajey,
that is expected behavior. An invalid CAST will return NULL.
What was your expectation?
Alex
Created 05-20-2015 07:50 PM
Well, the expectation was for the query to fail. Currently, we are trying to convert a lot of code from Teradata and in Teradata this exact same query would fail.
Created 05-20-2015 10:05 PM
Thanks for the explanation, that makes sense. I'm afraid that Impala currently doesn't behave like that, although I can certainly see how it would make sense in some scenarios.
One possible way to workaround this limitation is to filter rows with IS NOT NULL for those interesting casts.
Created 05-22-2015 10:53 AM
Created 05-22-2015 11:14 AM
In that case, there's still a chance you can get the your desired behavior. When scanning text data, Impala does have an option to abort on any parsing error encountered, e.g., if you declared a column as INT, but a particular text value could not be parsed as an INT in the scan.
You can enable this behavior with a query option, see:
Created 05-22-2015 12:11 PM
Alex, I tried to do the same. Here is the set of queries I ran for the same. But it didn’t fail or abort.
[eplnx070:21000] > set ABORT_ON_ERROR=true;
ABORT_ON_ERROR set to true
[eplnx070:21000] > set ABORT_ON_ERROR=1;
ABORT_ON_ERROR set to 1
[eplnx070:21000] > insert into src values (cast('1234a' as integer),'ajay');
Query: insert into src values (cast('1234a' as integer),'ajay')
Inserted 1 row(s) in 0.92s
[eplnx070:21000] > select * from src;
Query: select * from src
+------+------+
| col1 | col2 |
+------+------+
| NULL | A |
| NULL | ajay |
| NULL | ajay |
| NULL | ajay |
+------+------+
Created 05-22-2015 02:09 PM
That change only affects values being parsed inside a scan node. In your example you are casting a literal - the query option will have no effect on that.