About mrizvi

mrizvi · ‎04-22-2017

Thanks for the response. I am still getting the same exception while doing regexp_extract.

mrizvi · ‎04-20-2017

@gnovak Thanks a lot, I guess I missed that point. That has to be the reason why there is nothing in the output.

mrizvi · ‎04-18-2017

Hi guys, I have an input file which looks like: 1:Washington Berry Juice 1356:Carrington Frozen Corn-41 446:Red Wing Plastic Knives-39 1133:Tri-State Almonds-41 1252:Skinner Strawberry Drink-39 868:Nationeel Raspberry Fruit Roll-39 360:Carlson Low Fat String Cheese-38 2:Washington Mango Drink 233:Best Choice Avocado Dip-61 1388:Sunset Paper Plates-63 878:Thresher Semi-Sweet Chocolate Bar-63 529:Fast BBQ Potato Chips-62 382:Moms Roasted Chicken-631 191:Musial Tasty Candy Bar-62 This is the output from user recommendation engine. The first pair is the main product ID and name. Next 6 are ProductId:Name:Count and all the 6 products are delimited by tab. I want to load this data in a Hive table. As you can see here, there are multi delimeters, so I created a temporary table first having only one string column and then inserted this file. Next, i created a final table having the correct attributes and data types. Now when I am inserting the data using regular expression by running the query: insert overwrite table recommendation SELECT regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) productId, regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) productName, regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) productId1, regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) productName1, regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) productCount1, regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) productId2, regexp_extract(col_value, '^(?:([^,]*),?){7}', 1) productName2, regexp_extract(col_value, '^(?:([^,]*),?){8}', 1) productCount2, regexp_extract(col_value, '^(?:([^,]*),?){9}', 1) productId3, regexp_extract(col_value, '^(?:([^,]*),?){10}', 1) productName3, regexp_extract(col_value, '^(?:([^,]*),?){11}', 1) productCount3, regexp_extract(col_value, '^(?:([^,]*),?){12}', 1) productId4, regexp_extract(col_value, '^(?:([^,]*),?){13}', 1) productName4, regexp_extract(col_value, '^(?:([^,]*),?){14}', 1) productCount4, regexp_extract(col_value, '^(?:([^,]*),?){15}', 1) productId5, regexp_extract(col_value, '^(?:([^,]*),?){16}', 1) productName5, regexp_extract(col_value, '^(?:([^,]*),?){17}', 1) productCount5, regexp_extract(col_value, '^(?:([^,]*),?){18}', 1) productId6, regexp_extract(col_value, '^(?:([^,]*),?){19}', 1) productName6, regexp_extract(col_value, '^(?:([^,]*),?){20}', 1) productCount6 from temp_recommendation; I am getting this exception: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.mapreduce.v2.util.MRApps.addLog4jSystemProperties(Lorg/apache/hadoop/mapred/Task;Ljava/util/List;Lorg/apache/hadoop/conf/Configuration;)V There are no logs generated and this is a pseudo distributed machine. Is this method wrong for handling multi delimiters or is there any other other better way? Thanks in advance

mrizvi · ‎04-13-2017

I am doing Sort merge join using tez examples jar using Tez 0.7.1. The sample of two files are:- ISBN;"Book-Title";"Book-Author";"Year-Of-Publication";"Publisher";"Image-URL-S";"Image-URL-M";"Image-URL-L" 0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg" User-ID;"ISBN";"Book-Rating" 276725;"034545104X";"0" First one has 300 thousand and second one has around 1 million records and the common attribute is ISBN of a book. The DAG is getting completed successfully but there is no output. Even the logs look fine. My understanding of SortMergeJoin is that it sorts both datasets on the join attribute and then looks for qualifying records by merging the two datasets. The sorting step groups all tuples with the same value in the join column together and thus makes it easy to identify partitions or groups of tuples with the same value in the join column. I am referring this link from Tez examples. Just wanted to confirm that how is it deciding the join attribute which in this case should be ISBN. PLease help.

mrizvi · ‎12-13-2016

@Yukti Agrawal , the answer is no. But if you have HUE installed in your cluster, you should see some scripts. Check under this folder /usr/lib/hue/apps/oozie/examples. Or do simply run locate command to search it. I would recommend to create a simple script if you just want to test pig. Store a sample csv in HDFS and load that file in Pig: A = LOAD '/the/path/in/HDFS/sample.csv' USING PigStorage(',') AS (driverId:int, truckId:int, driverName:chararray); some_columns = FOREACH A GENERATE driverId,driverName; STORE some_columns into 'output/some_columns' using PigStorage(',');

mrizvi · ‎12-12-2016

Please help guys, I am kind of stuck here.

mrizvi · ‎12-12-2016

@Ahmad Hassan, Please accept the best answer to close the thread.

mrizvi · ‎12-12-2016

check your virtual box, is it showing if your VM is running. This usually happens when you do not have enough RAM. I would suggest you to get the machine which has at least 12 GB of RAM so that you can allocate 8GB to the Sandbox. Or download the previous version of Sandbox from here. Go to Hortonworks Sandbox in the Cloud section and click on Expand next to Sandbox archive. Download HDP 2.4 from there.

mrizvi · ‎12-12-2016

@Ahmad Hassan, please try to use comment section when you are not answering. Thank you

mrizvi · ‎12-12-2016

@Ahmad Hassan , this Sandbox is based on docker. So there is a VM and inside that VM, there is a docker container where all HDP components are installed and running. You might have login to the VM using either port 2122 or 22. Exit the shell and then do the following from your local machine terminal to enter the docker container(use port 2222😞 ssh root@127.0.0.1 -p2222 This will ask for the password which is hadoop and then it will prompt you to change the password. Then try running hive command.

Online	Offline
Last Visited	‎08-14-2019 08:03 PM

Member Since	‎05-09-2016 01:14 AM
Last Visited	‎08-14-2019 08:03 PM
Posts	280
Kudos received	58

Cloudera Community

Re: Hive database/table monitoring

Re: Exception while using Spark HBase Connector on...

Re: Like Example.jar is there any sample pig scrip...

Re: I am trying to use sandbox with virtual machin...

Re: Pig on Hortonworks Sandbox In Azure

Re: Getting exception while inserting the file hav...

Re: Not getting any output in SortMergeJoin exampl...

Getting exception while inserting the file having ...

Not getting any output in SortMergeJoin example of...

Re: Like Example.jar is there any sample pig scrip...

Re: Getting null device message while displaying g...

Re: I am trying to use sandbox with virtual machin...

Re: I am trying to use sandbox with virtual machin...

Re: I am trying to use sandbox with virtual machin...

Re: I am trying to use sandbox with virtual machin...