Support Questions
Find answers, ask questions, and share your expertise

regex pattern for hive regex serde

Rising Star

Can someone pls help me creating regex pattern to use for creating a hive table with RegEx Serde....

I want hive regex table to be created with the pattern ^^^^^^^^^^ [10 anchor characters]  as a delimiter! I am not sure what would be the regex pattern of hive table for this!!

please help.

Thanks a lot in advance

1 ACCEPTED SOLUTION

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

View solution in original post

8 REPLIES 8

Super Guru
@Raja A

I will try \\^ for each character. so basically

\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^

Can you try this?

Rising Star

Thanks for the response @mqureshi

I Tried below two ways, but that did not work...

  1. "input.regex" = "\\^\^\\^\\^\\^\\^\\^\\^\\^\\^"
  2. "input.regex" = "(\\^\^\\^\\^\\^\\^\\^\\^\\^\\^)"

Any other thoughts i can try of ?

Super Guru

Hi @Raja A

In both cases the second one has only single slash. Can you try it with two slashes. I think the first one should work if you just add two slashes. Also, why not try to make it work with only "^^" first. Figure out how to make ^^ work and then you can simply extend that to ^^^^^^^^^^.

Rising Star

Thanks for your response and time @mqureshi

I tried for ^^... its still not working...!! And, sorry, my above comment had a typo !!

Below is what i am trying !!

create external table <table_name> ( col1 string, col2 string )
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.RegexSerDe"
WITH SERDEPROPERTIES (
"input.regex" = "^^"
)
STORED AS TEXTFILE
LOCATION "<hdfs_path>";

Super Guru

@Raja A

I will try this on my machine when I get a chance, hopefully later today but I think it should

"input.regex"="^\\^\\^"

The first one signifies the beginning of a string and other two are for your matching pattern.

Hi @Raja A, can you try

"input.regex"="(.*)\\^\\^\\^\\^\\^\\^\\^\\^\\^\\^(.*)"

You need groups to map to your table fields.

Rising Star

Hi @Predrag Minovic

Thanks a lot for the response. this works!!

However, this is working only when i am having 2 columns in the input... but not working when there are more columns !!

Well, you said you had only 2 columns 🙂 For more columns you can either change the regex, or try MultiDelimitSerDe if you are on Hive-0.14 or newer. By the way, inspired by your question I wrote an article about RegexSerDe.

; ;