Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

RegExp problems (works in Ambari Hive view, fails in commandline)

avatar
Contributor

Hello Hadoopers!

I'm testing some stuff in the Sandbox, and I've come up against this problem. The regexp below works fine when run in the Hive view GUI in Ambari, but when run from a Hive shell in the command line, it fails with error: "....cannot recognize input near ',' ')' 'as' in expression specification" in the regexp line.

INSERT OVERWRITE TABLE my_table
	SELECT trim(daily_date),
		trim(name), 
		cast(trim(count) AS INT),
		regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd/MM/yyyy')),0,7),"-","") as month

	FROM staging_table
;	

Have tried combinations of backticks and single quote marks within the expression - nothing works. But the problem is why Ambari/Hive view would process without error (and with correct output!) but command line Hive throws up this error.

Suspect it's to do with the odd characters in the expression, and how the shell treats them combined with the Hive environment... But I thought that all commands run within the Hive shell would be treated as Hive-QL and therefore wouldn't have the escaping issues that the command line does...

Puzzled, and completely stuck.

Any help troubleshooting this would be very gratefully received!

1 ACCEPTED SOLUTION

avatar
Contributor

Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems.

This works:

regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month

Lesson: when embedded in bash scripts, be extra careful about symbol consistency!

I'm sorry <simple question shame>!

View solution in original post

4 REPLIES 4

avatar
Master Guru

Hi @R Wys, I just tried, but for me your regexp works as-is from Hive command line! Try testing only that part

SELECT daily_date, regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date),'dd/MM/yyyy')),0,7),"-","")as month FROM staging_table;

Was that error with the command you posted, or with another version? Your posted version looks good, replace 0 by 1 as the substring starting index (as per specs), it works with 0 but better follow the specs. You can also try to escape "-" by "\-". Here is my result, note that my date-time field is in different format:

hive> select s, regexp_replace(substr(from_unixtime(unix_timestamp(trim(s), 'yyyy-MM-dd')),0,7),"-","") as month from yr;
OK
s			month
1975-01-01 00:00:00	197501
1999-10-12		199910
2016-03-22 10:20:30.155	201603
2001-07-07 11:00:05.0	200107
Time taken: 0.563 seconds, Fetched: 4 row(s)

avatar
Contributor

Hello @Predrag Minovic and thank you for your reply!

Apologies, I should have been more clear. This expression does indeed work when called directly in Hive from the command line. Because this will be an automated process, what I am doing is running a bash shell script on the cmd, which calls Hive like this:

#!/bin/bash
. /root/.bash_profile
# local file naming, cleaning, and other admin stuff
hive -e "
	use my_db;
	SELECT daily_date, 
		regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),1,7),"-","") as month
	FROM staging_table;
	quit;
"

The error is: FAILED: ParseException line [regexp:line] cannot recognize input near ',' ')' 'as' in expression specification

This error is also generated when the / symbols in the date format are escaped (i.e. dd\/MM\/yyyy). Unfortunately I can't do anything about that nasty date format 01/01/2011 😞

avatar
Contributor

Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems.

This works:

regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month

Lesson: when embedded in bash scripts, be extra careful about symbol consistency!

I'm sorry <simple question shame>!

avatar

No problem, we all do these mistakes at times 🙂