Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

regex pattern for hive regex serde

Solved Go to solution
Highlighted

regex pattern for hive regex serde

Rising Star

Can someone pls help me creating regex pattern to use for creating a hive table with RegEx Serde....

I want hive regex table to be created with the pattern ^^^^^^^^^^ [10 anchor characters]  as a delimiter! I am not sure what would be the regex pattern of hive table for this!!

please help.

Thanks a lot in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Re: regex pattern for hive regex serde

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

8 REPLIES 8

Re: regex pattern for hive regex serde

Super Guru
@Raja A

I will try \\^ for each character. so basically

\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^

Can you try this?

Re: regex pattern for hive regex serde

Rising Star

Thanks for the response @mqureshi

I Tried below two ways, but that did not work...

  1. "input.regex" = "\\^\^\\^\\^\\^\\^\\^\\^\\^\\^"
  2. "input.regex" = "(\\^\^\\^\\^\\^\\^\\^\\^\\^\\^)"

Any other thoughts i can try of ?

Re: regex pattern for hive regex serde

Super Guru

Hi @Raja A

In both cases the second one has only single slash. Can you try it with two slashes. I think the first one should work if you just add two slashes. Also, why not try to make it work with only "^^" first. Figure out how to make ^^ work and then you can simply extend that to ^^^^^^^^^^.

Re: regex pattern for hive regex serde

Rising Star

Thanks for your response and time @mqureshi

I tried for ^^... its still not working...!! And, sorry, my above comment had a typo !!

Below is what i am trying !!

create external table <table_name> ( col1 string, col2 string )
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.RegexSerDe"
WITH SERDEPROPERTIES (
"input.regex" = "^^"
)
STORED AS TEXTFILE
LOCATION "<hdfs_path>";

Re: regex pattern for hive regex serde

Super Guru

@Raja A

I will try this on my machine when I get a chance, hopefully later today but I think it should

"input.regex"="^\\^\\^"

The first one signifies the beginning of a string and other two are for your matching pattern.

Re: regex pattern for hive regex serde

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

Re: regex pattern for hive regex serde

Rising Star

Hi @Predrag Minovic

Thanks a lot for the response. this works!!

However, this is working only when i am having 2 columns in the input... but not working when there are more columns !!

Re: regex pattern for hive regex serde

Well, you said you had only 2 columns :-) For more columns you can either change the regex, or try MultiDelimitSerDe if you are on Hive-0.14 or newer. By the way, inspired by your question I wrote an article about RegexSerDe.