Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Incorrect CAST as INTEGER

Incorrect CAST as INTEGER

Hi,

 

We are currently on Impala v2.2.0 and we are facing an issue when we try to insert a string value into an integer column. One of our data pipes is providing a string value in the src and the target column is an INT type. But the data gets inserted as NULL when we cast the column as INTEGER. 

 

Is it supposed to behave in this manner?

 

Here are the steps for the same:

 

CREATE TABLE svuidh_prototype.src (                                       
  col1 INT,                                                               
  col2 STRING                                                             
)                                                                         
STORED AS TEXTFILE                                                        
LOCATION 'hdfs://nameservice1/user/hive/warehouse/svuidh_prototype.db/src'
TBLPROPERTIES ('transient_lastDdlTime'='1430844770');

 

[eplnx070:21000] > insert into src values (cast('ABC' as integer),'A');
Query: insert into src values (cast('ABC' as integer),'A')
Inserted 1 row(s) in 0.74s
[eplnx070:21000] > select * from src;
Query: select * from src
+------+------+
| col1 | col2 |
+------+------+
| NULL | A    |
+------+------+
Fetched 1 row(s) in 0.57s                      

 

Thanks,

Ajay

7 REPLIES 7
Highlighted

Re: Incorrect CAST as INTEGER

Master Collaborator

Hi Ajey,

 

that is expected behavior. An invalid CAST will return NULL.

 

What was your expectation?

 

Alex

Highlighted

Re: Incorrect CAST as INTEGER

Well, the expectation was for the query to fail. Currently, we are trying to convert a lot of code from Teradata and in Teradata this exact same query would fail.  

Highlighted

Re: Incorrect CAST as INTEGER

Master Collaborator

Thanks for the explanation, that makes sense. I'm afraid that Impala currently doesn't behave like that, although I can certainly see how it would make sense in some scenarios.

 

One possible way to workaround this limitation is to filter rows with IS NOT NULL for those interesting casts.

 

Highlighted

Re: Incorrect CAST as INTEGER

Thanks a lot Alex for the info. Looks like we need to find a different workaround since we also expect to get NULL values from the source.


Highlighted

Re: Incorrect CAST as INTEGER

Master Collaborator

In that case, there's still a chance you can get the your desired behavior. When scanning text data, Impala does have an option to abort on any parsing error encountered, e.g., if you declared a column as INT, but a particular text value could not be parsed as an INT in the scan.

 

You can enable this behavior with a query option, see:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v2-0-x/topics/impala_abort...

 

Highlighted

Re: Incorrect CAST as INTEGER

Alex, I tried to do the same. Here is the set of queries I ran for the same. But it didn’t fail or abort.

 

[eplnx070:21000] > set ABORT_ON_ERROR=true;
ABORT_ON_ERROR set to true

[eplnx070:21000] > set ABORT_ON_ERROR=1;

ABORT_ON_ERROR set to 1

[eplnx070:21000] > insert into src values (cast('1234a' as integer),'ajay');

Query: insert into src values (cast('1234a' as integer),'ajay')

Inserted 1 row(s) in 0.92s

[eplnx070:21000] > select * from src;

Query: select * from src

+------+------+

| col1 | col2 |

+------+------+

| NULL | A    |

| NULL | ajay |

| NULL | ajay |

| NULL | ajay |

+------+------+

Highlighted

Re: Incorrect CAST as INTEGER

Master Collaborator

That change only affects values being parsed inside a scan node. In your example you are casting a literal - the query option will have no effect on that.

Don't have an account?
Coming from Hortonworks? Activate your account here