Created 05-03-2017 05:35 AM
I have a table `exchangedata` in hive with the following columns
day STRING euro float yen float dollar float
and the tasks to perform on this table are
1) Load data from '/user/practiceuser/exchangedata.csv' in HDFS into `exchangedata` table
2) Remove All the rows where value of euro, yen and dollar is equal to string '#N/A'
3) Load the result into `exchangedata` table
My Solution -------
As the `exchangedata` table is not in ORC format and is also not clustered(For DELETE query), I created a temporary table `exchange_temp` with the same columns and then used below query to insert data into `exchangedata`
INSERT INTO TABLE exchangedata SELECT day,euro,yen,dollar FROM exchange_temp WHERE euro != '#N/A' AND yen != '#N/A' AND dollar != '#N/A';
But after executing this query I am not getting any row in output. My question is how can I compare float datatype column with String '#N/A'?
The data in the CSV file looks like this
FRIDAY,1.234,2.456,1.059
Created 05-03-2017 08:53 AM
That is always a night mare in JAVA based tools. Hive relies on JAVA (plus SQL) so it respects the IEEE standard for number semantics. That means especially NaN (not a number) values in float columns are a tricky thing.
First of all: Have you tested what is returned for the '#N/A' columns when you do a select? I guess it is rather 'NaN' than '#N/A'.
So after testing the return value, I would suggest to test two approaches. Either try to use cast():
cast(dollar as String) <>'NaN'
(because all possible NaN values are displayed as "NaN" even if they are not strictly "equal" in the arithmetical sense)
or do the old trick and test the value of the column to fit a mathematical operation like e.g.
dollar +1.0 > dollar
Created 05-03-2017 08:53 AM
That is always a night mare in JAVA based tools. Hive relies on JAVA (plus SQL) so it respects the IEEE standard for number semantics. That means especially NaN (not a number) values in float columns are a tricky thing.
First of all: Have you tested what is returned for the '#N/A' columns when you do a select? I guess it is rather 'NaN' than '#N/A'.
So after testing the return value, I would suggest to test two approaches. Either try to use cast():
cast(dollar as String) <>'NaN'
(because all possible NaN values are displayed as "NaN" even if they are not strictly "equal" in the arithmetical sense)
or do the old trick and test the value of the column to fit a mathematical operation like e.g.
dollar +1.0 > dollar
Created 05-05-2017 06:52 AM
Thanks for your replay Peter. Actually, this was EXACTLY the question asked during my certification exam and the column was a string '#N/A'. I didn't know about the cast() function, so I was directly comparing float datatype to a string value using '!='. Also the SELECT wasn't returning any rows.
Created 05-05-2017 06:54 AM
Anyways, your answer has given an approach to solve this. Shall accept your answer.