Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Getting Null after extracting data from HDFS in Hive?

Solved Go to solution
Highlighted

Getting Null after extracting data from HDFS in Hive?

I'm still a noob in this world, so bare with me if there are any misconceptions.

What have I done so far is, I've setup Hadoop and Hive in my linux enviroment and I'm trying to extract data from an HDFS file into my Hive table.

This is how the log line looks like:

 apache2013-10-09T14:04:32Zphp129.124.201.110/EKEE.php20019705910Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24

This my Hive query:

CREATE EXTERNAL TABLE LogParserSample(
logtype STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
 'input.regex' = '^([\w\-]+)'
) 
STORED AS TEXTFILE
LOCATION '/mypath/';

When I did execute the above the table got created successfully. But then when I tried to query the data from that table using a SELECT statement, I'm getting null instead of the word apache. For this instance, I'm trying to extract the word apache from the above log line. But then I tried executing the same regex in rubular, the outcome was correct as I expected. I'm unable to figure out why!

Have I missed anything trivial above? Any help could be appreciated. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Getting Null after extracting data from HDFS in Hive?

Changing the regex to below worked for me:

^([^0-9]+).*

Hope it helps!

View solution in original post

3 REPLIES 3

Re: Getting Null after extracting data from HDFS in Hive?

Hi @Kulasangar Gowrisangar

It seems that you regexp pattern is not correct. Try this one ^[^0-9]+

If you need more help with regular expressions I recommend this site https://regex101.com/

This pattern starts from the beginning of the string and then is looking for any character except numbers. With your example row the result is string "apache".

Highlighted

Re: Getting Null after extracting data from HDFS in Hive?

Thank you @Andres Koitmäe for your quick reply.

I updated my query with the regex you mentioned, but still, I'm getting it as null. Is that the normal behavior when I search in with a SELECT query after executing the hive query?

To make sure that I've made the correct changes I'm pasting the hive query here again:

CREATE EXTERNAL TABLE LogParserSample(
logtype STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
 'input.regex' = '^[^0-9]+'
) 
STORED AS TEXTFILE
LOCATION '/mypath/';

Thanks again.

Highlighted

Re: Getting Null after extracting data from HDFS in Hive?

Changing the regex to below worked for me:

^([^0-9]+).*

Hope it helps!

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here