Support Questions

mrizvi · ‎04-18-2017

Hi guys,

I have an input file which looks like:

1:Washington Berry Juice   1356:Carrington Frozen Corn-41   446:Red Wing Plastic Knives-39   1133:Tri-State Almonds-41   1252:Skinner Strawberry Drink-39   868:Nationeel Raspberry Fruit Roll-39   360:Carlson Low Fat String Cheese-38   
2:Washington Mango Drink   233:Best Choice Avocado Dip-61   1388:Sunset Paper Plates-63   878:Thresher Semi-Sweet Chocolate Bar-63   529:Fast BBQ Potato Chips-62   382:Moms Roasted Chicken-631	191:Musial Tasty Candy Bar-62

This is the output from user recommendation engine. The first pair is the main product ID and name. Next 6 are ProductId:Name:Count and all the 6 products are delimited by tab.

I want to load this data in a Hive table. As you can see here, there are multi delimeters, so I created a temporary table first having only one string column and then inserted this file. Next, i created a final table having the correct attributes and data types. Now when I am inserting the data using regular expression by running the query:

insert overwrite table recommendation SELECT  
regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) productId,
regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) productName,  
regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) productId1,  
regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) productName1,  
regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) productCount1,  
regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) productId2,  
regexp_extract(col_value, '^(?:([^,]*),?){7}', 1) productName2,  
regexp_extract(col_value, '^(?:([^,]*),?){8}', 1) productCount2,  
regexp_extract(col_value, '^(?:([^,]*),?){9}', 1) productId3,  
regexp_extract(col_value, '^(?:([^,]*),?){10}', 1) productName3,  
regexp_extract(col_value, '^(?:([^,]*),?){11}', 1) productCount3,  
regexp_extract(col_value, '^(?:([^,]*),?){12}', 1) productId4,  
regexp_extract(col_value, '^(?:([^,]*),?){13}', 1) productName4,  
regexp_extract(col_value, '^(?:([^,]*),?){14}', 1) productCount4,  
regexp_extract(col_value, '^(?:([^,]*),?){15}', 1) productId5,  
regexp_extract(col_value, '^(?:([^,]*),?){16}', 1) productName5,  
regexp_extract(col_value, '^(?:([^,]*),?){17}', 1) productCount5,  
regexp_extract(col_value, '^(?:([^,]*),?){18}', 1) productId6,  
regexp_extract(col_value, '^(?:([^,]*),?){19}', 1) productName6,  
regexp_extract(col_value, '^(?:([^,]*),?){20}', 1) productCount6
from temp_recommendation;

I am getting this exception:

FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.mapreduce.v2.util.MRApps.addLog4jSystemProperties(Lorg/apache/hadoop/mapred/Task;Ljava/util/List;Lorg/apache/hadoop/conf/Configuration;)V

There are no logs generated and this is a pseudo distributed machine. Is this method wrong for handling multi delimiters or is there any other other better way? Thanks in advance

eberezitsky · ‎04-20-2017

@Mushtaq Rizvi,

you can define tab delimited table. This will give you already 7 columns, without any regex for them.

then, in order to extract particular parts from each product column use "(\d+):([^-\t]+)-(\d+)" as regex (except for the first one, which would be just "(\d+):([^-\t]+)"...)

or you can also use split instead of regex.

insert overwrite table recommendation SELECT  
regexp_extract(col_p, '(\d+):', 1) 		productId,
regexp_extract(col_p, ':(.+)', 1) 		productName,  
regexp_extract(col_p1, '(\d+):', 1) 	productId1,  
regexp_extract(col_p1, ':([^-]+)', 1) 	productName1,  
regexp_extract(col_p1, '-(.+)', 1) 		productCount1,  
regexp_extract(col_p2, '(\d+):', 1) 	 productId2,  
regexp_extract(col_p2, ':([^-]+)', 1)  productName2,  
regexp_extract(col_p2, '-(.+)', 1) 		 productCount2,  
regexp_extract(col_p3, '(\d+):', 1) 	 productId3,  
regexp_extract(col_p3, ':([^-]+)', 1)  productName3,  
regexp_extract(col_p3, '-(.+)', 1) 		 productCount3,  
regexp_extract(col_p4, '(\d+):', 1) 	 productId4,  
regexp_extract(col_p4, ':([^-]+)', 1)  productName4,  
regexp_extract(col_p4, '-(.+)', 1) 		 productCount4,  
regexp_extract(col_p5, '(\d+):', 1) 	 productId5,  
regexp_extract(col_p5, ':([^-]+)', 1)  productName5,  
regexp_extract(col_p5, '-(.+)', 1) 		 productCount5,  
regexp_extract(col_p6, '(\d+):', 1) 	 productId6,  
regexp_extract(col_p6, ':([^-]+)', 1)  productName6,  
regexp_extract(col_p6, '-(.+)', 1) 		 productCount6
from temp_recommendation;

View solution in original post

eberezitsky · ‎04-20-2017

@Mushtaq Rizvi,

you can define tab delimited table. This will give you already 7 columns, without any regex for them.

then, in order to extract particular parts from each product column use "(\d+):([^-\t]+)-(\d+)" as regex (except for the first one, which would be just "(\d+):([^-\t]+)"...)

or you can also use split instead of regex.

insert overwrite table recommendation SELECT  
regexp_extract(col_p, '(\d+):', 1) 		productId,
regexp_extract(col_p, ':(.+)', 1) 		productName,  
regexp_extract(col_p1, '(\d+):', 1) 	productId1,  
regexp_extract(col_p1, ':([^-]+)', 1) 	productName1,  
regexp_extract(col_p1, '-(.+)', 1) 		productCount1,  
regexp_extract(col_p2, '(\d+):', 1) 	 productId2,  
regexp_extract(col_p2, ':([^-]+)', 1)  productName2,  
regexp_extract(col_p2, '-(.+)', 1) 		 productCount2,  
regexp_extract(col_p3, '(\d+):', 1) 	 productId3,  
regexp_extract(col_p3, ':([^-]+)', 1)  productName3,  
regexp_extract(col_p3, '-(.+)', 1) 		 productCount3,  
regexp_extract(col_p4, '(\d+):', 1) 	 productId4,  
regexp_extract(col_p4, ':([^-]+)', 1)  productName4,  
regexp_extract(col_p4, '-(.+)', 1) 		 productCount4,  
regexp_extract(col_p5, '(\d+):', 1) 	 productId5,  
regexp_extract(col_p5, ':([^-]+)', 1)  productName5,  
regexp_extract(col_p5, '-(.+)', 1) 		 productCount5,  
regexp_extract(col_p6, '(\d+):', 1) 	 productId6,  
regexp_extract(col_p6, ':([^-]+)', 1)  productName6,  
regexp_extract(col_p6, '-(.+)', 1) 		 productCount6
from temp_recommendation;

eberezitsky · ‎04-21-2017

@Mushtaq Rizvi,

if that worked for you, please accept the answer so your question will be marked as resolved. Thanks.

mrizvi · ‎04-22-2017

Thanks for the response. I am still getting the same exception while doing regexp_extract.

mrizvi · ‎04-22-2017

Just saw that any query in Hive which involves map reduce job is giving the same exception.

mrizvi · ‎04-22-2017

Got it right. Actually whenever I was starting my hive shell, I was getting this warning:

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

So I installed Tez(version 0.8.5) and changed the execution engine of Hive to Tez. Now all Hive queries that involve MapReduce job are running. My hive version is 2.1.1, that I guess do not work with MapReduce

As per the regex, thanks alot @Ed Berezitsky,

Those regex worked.

Cloudera Community

Support Questions

Getting exception while inserting the file having multi delimiters in a Hive table

NiFi: How to detect updates to S3 files and insert...

hive Insert to Dynamic Partition query Generating ...

Hive Query against ORC table failing with serious ...

Hive ARRAY column INSERT INTO SELECT FROM exceptio...

Hive get exception when try to do insert/create op...

Converting CSV Files to Apache Hive Tables with Ap...

Hive and comma delimited fields in managed table

hive insert query is failing with different except...

Null Pointer exception during Hive-Hbase Table cre...

HDP clients with multi-version and multi-OS suppor...