Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Incorrect CAST as INTEGER

avatar

Hi,

 

We are currently on Impala v2.2.0 and we are facing an issue when we try to insert a string value into an integer column. One of our data pipes is providing a string value in the src and the target column is an INT type. But the data gets inserted as NULL when we cast the column as INTEGER. 

 

Is it supposed to behave in this manner?

 

Here are the steps for the same:

 

CREATE TABLE svuidh_prototype.src (                                       
  col1 INT,                                                               
  col2 STRING                                                             
)                                                                         
STORED AS TEXTFILE                                                        
LOCATION 'hdfs://nameservice1/user/hive/warehouse/svuidh_prototype.db/src'
TBLPROPERTIES ('transient_lastDdlTime'='1430844770');

 

[eplnx070:21000] > insert into src values (cast('ABC' as integer),'A');
Query: insert into src values (cast('ABC' as integer),'A')
Inserted 1 row(s) in 0.74s
[eplnx070:21000] > select * from src;
Query: select * from src
+------+------+
| col1 | col2 |
+------+------+
| NULL | A    |
+------+------+
Fetched 1 row(s) in 0.57s                      

 

Thanks,

Ajay

7 REPLIES 7

avatar

Hi Ajey,

 

that is expected behavior. An invalid CAST will return NULL.

 

What was your expectation?

 

Alex

avatar

Well, the expectation was for the query to fail. Currently, we are trying to convert a lot of code from Teradata and in Teradata this exact same query would fail.  

avatar

Thanks for the explanation, that makes sense. I'm afraid that Impala currently doesn't behave like that, although I can certainly see how it would make sense in some scenarios.

 

One possible way to workaround this limitation is to filter rows with IS NOT NULL for those interesting casts.

 

avatar
Thanks a lot Alex for the info. Looks like we need to find a different workaround since we also expect to get NULL values from the source.


avatar

In that case, there's still a chance you can get the your desired behavior. When scanning text data, Impala does have an option to abort on any parsing error encountered, e.g., if you declared a column as INT, but a particular text value could not be parsed as an INT in the scan.

 

You can enable this behavior with a query option, see:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v2-0-x/topics/impala_abort...

 

avatar

Alex, I tried to do the same. Here is the set of queries I ran for the same. But it didn’t fail or abort.

 

[eplnx070:21000] > set ABORT_ON_ERROR=true;
ABORT_ON_ERROR set to true

[eplnx070:21000] > set ABORT_ON_ERROR=1;

ABORT_ON_ERROR set to 1

[eplnx070:21000] > insert into src values (cast('1234a' as integer),'ajay');

Query: insert into src values (cast('1234a' as integer),'ajay')

Inserted 1 row(s) in 0.92s

[eplnx070:21000] > select * from src;

Query: select * from src

+------+------+

| col1 | col2 |

+------+------+

| NULL | A    |

| NULL | ajay |

| NULL | ajay |

| NULL | ajay |

+------+------+

avatar

That change only affects values being parsed inside a scan node. In your example you are casting a literal - the query option will have no effect on that.