Support Questions

Find answers, ask questions, and share your expertise

regex pattern for hive regex serde

avatar
Expert Contributor

Can someone pls help me creating regex pattern to use for creating a hive table with RegEx Serde....

I want hive regex table to be created with the pattern ^^^^^^^^^^ [10 anchor characters]  as a delimiter! I am not sure what would be the regex pattern of hive table for this!!

please help.

Thanks a lot in advance

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

View solution in original post

8 REPLIES 8

avatar
Super Guru
@Raja A

I will try \\^ for each character. so basically

\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^

Can you try this?

avatar
Expert Contributor

Thanks for the response @mqureshi

I Tried below two ways, but that did not work...

  1. "input.regex" = "\\^\^\\^\\^\\^\\^\\^\\^\\^\\^"
  2. "input.regex" = "(\\^\^\\^\\^\\^\\^\\^\\^\\^\\^)"

Any other thoughts i can try of ?

avatar
Super Guru

Hi @Raja A

In both cases the second one has only single slash. Can you try it with two slashes. I think the first one should work if you just add two slashes. Also, why not try to make it work with only "^^" first. Figure out how to make ^^ work and then you can simply extend that to ^^^^^^^^^^.

avatar
Expert Contributor

Thanks for your response and time @mqureshi

I tried for ^^... its still not working...!! And, sorry, my above comment had a typo !!

Below is what i am trying !!

create external table <table_name> ( col1 string, col2 string )
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.RegexSerDe"
WITH SERDEPROPERTIES (
"input.regex" = "^^"
)
STORED AS TEXTFILE
LOCATION "<hdfs_path>";

avatar
Super Guru

@Raja A

I will try this on my machine when I get a chance, hopefully later today but I think it should

"input.regex"="^\\^\\^"

The first one signifies the beginning of a string and other two are for your matching pattern.

avatar
Master Guru

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

avatar
Expert Contributor

Hi @Predrag Minovic

Thanks a lot for the response. this works!!

However, this is working only when i am having 2 columns in the input... but not working when there are more columns !!

avatar
Master Guru

Well, you said you had only 2 columns 🙂 For more columns you can either change the regex, or try MultiDelimitSerDe if you are on Hive-0.14 or newer. By the way, inspired by your question I wrote an article about RegexSerDe.