Created 05-03-2016 04:03 PM
Hello Hadoopers!
I'm testing some stuff in the Sandbox, and I've come up against this problem. The regexp below works fine when run in the Hive view GUI in Ambari, but when run from a Hive shell in the command line, it fails with error: "....cannot recognize input near ',' ')' 'as' in expression specification" in the regexp line.
INSERT OVERWRITE TABLE my_table SELECT trim(daily_date), trim(name), cast(trim(count) AS INT), regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd/MM/yyyy')),0,7),"-","") as month FROM staging_table ;
Have tried combinations of backticks and single quote marks within the expression - nothing works. But the problem is why Ambari/Hive view would process without error (and with correct output!) but command line Hive throws up this error.
Suspect it's to do with the odd characters in the expression, and how the shell treats them combined with the Hive environment... But I thought that all commands run within the Hive shell would be treated as Hive-QL and therefore wouldn't have the escaping issues that the command line does...
Puzzled, and completely stuck.
Any help troubleshooting this would be very gratefully received!
Created 05-04-2016 10:01 AM
Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems.
This works:
regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month
Lesson: when embedded in bash scripts, be extra careful about symbol consistency!
I'm sorry <simple question shame>!
Created 05-04-2016 06:48 AM
Hi @R Wys, I just tried, but for me your regexp works as-is from Hive command line! Try testing only that part
SELECT daily_date, regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date),'dd/MM/yyyy')),0,7),"-","")as month FROM staging_table;
Was that error with the command you posted, or with another version? Your posted version looks good, replace 0 by 1 as the substring starting index (as per specs), it works with 0 but better follow the specs. You can also try to escape "-" by "\-". Here is my result, note that my date-time field is in different format:
hive> select s, regexp_replace(substr(from_unixtime(unix_timestamp(trim(s), 'yyyy-MM-dd')),0,7),"-","") as month from yr; OK s month 1975-01-01 00:00:00 197501 1999-10-12 199910 2016-03-22 10:20:30.155 201603 2001-07-07 11:00:05.0 200107 Time taken: 0.563 seconds, Fetched: 4 row(s)
Created 05-04-2016 09:43 AM
Hello @Predrag Minovic and thank you for your reply!
Apologies, I should have been more clear. This expression does indeed work when called directly in Hive from the command line. Because this will be an automated process, what I am doing is running a bash shell script on the cmd, which calls Hive like this:
#!/bin/bash . /root/.bash_profile # local file naming, cleaning, and other admin stuff hive -e " use my_db; SELECT daily_date, regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),1,7),"-","") as month FROM staging_table; quit; "
The error is: FAILED: ParseException line [regexp:line] cannot recognize input near ',' ')' 'as' in expression specification
This error is also generated when the / symbols in the date format are escaped (i.e. dd\/MM\/yyyy). Unfortunately I can't do anything about that nasty date format 01/01/2011 😞
Created 05-04-2016 10:01 AM
Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems.
This works:
regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month
Lesson: when embedded in bash scripts, be extra careful about symbol consistency!
I'm sorry <simple question shame>!
Created 05-04-2016 10:04 AM
No problem, we all do these mistakes at times 🙂