Member since
05-09-2016
280
Posts
58
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3744 | 03-28-2018 02:12 PM | |
3022 | 01-09-2018 09:05 PM | |
1649 | 12-13-2016 05:07 AM | |
5022 | 12-12-2016 02:57 AM | |
4306 | 12-08-2016 07:08 PM |
04-22-2017
07:17 PM
Thanks for the response. I am still getting the same exception while doing regexp_extract.
... View more
04-20-2017
06:11 AM
@gnovak
Thanks a lot, I guess I missed that point. That has to be the reason why there is nothing in the output.
... View more
04-18-2017
01:50 AM
Hi guys, I have an input file which looks like: 1:Washington Berry Juice 1356:Carrington Frozen Corn-41 446:Red Wing Plastic Knives-39 1133:Tri-State Almonds-41 1252:Skinner Strawberry Drink-39 868:Nationeel Raspberry Fruit Roll-39 360:Carlson Low Fat String Cheese-38
2:Washington Mango Drink 233:Best Choice Avocado Dip-61 1388:Sunset Paper Plates-63 878:Thresher Semi-Sweet Chocolate Bar-63 529:Fast BBQ Potato Chips-62 382:Moms Roasted Chicken-631 191:Musial Tasty Candy Bar-62 This is the output from user recommendation engine. The first pair is the main product ID and name. Next 6 are ProductId:Name:Count and all the 6 products are delimited by tab. I want to load this data in a Hive table. As you can see here, there are multi delimeters, so I created a temporary table first having only one string column and then inserted this file. Next, i created a final table having the correct attributes and data types. Now when I am inserting the data using regular expression by running the query: insert overwrite table recommendation SELECT
regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) productId,
regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) productName,
regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) productId1,
regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) productName1,
regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) productCount1,
regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) productId2,
regexp_extract(col_value, '^(?:([^,]*),?){7}', 1) productName2,
regexp_extract(col_value, '^(?:([^,]*),?){8}', 1) productCount2,
regexp_extract(col_value, '^(?:([^,]*),?){9}', 1) productId3,
regexp_extract(col_value, '^(?:([^,]*),?){10}', 1) productName3,
regexp_extract(col_value, '^(?:([^,]*),?){11}', 1) productCount3,
regexp_extract(col_value, '^(?:([^,]*),?){12}', 1) productId4,
regexp_extract(col_value, '^(?:([^,]*),?){13}', 1) productName4,
regexp_extract(col_value, '^(?:([^,]*),?){14}', 1) productCount4,
regexp_extract(col_value, '^(?:([^,]*),?){15}', 1) productId5,
regexp_extract(col_value, '^(?:([^,]*),?){16}', 1) productName5,
regexp_extract(col_value, '^(?:([^,]*),?){17}', 1) productCount5,
regexp_extract(col_value, '^(?:([^,]*),?){18}', 1) productId6,
regexp_extract(col_value, '^(?:([^,]*),?){19}', 1) productName6,
regexp_extract(col_value, '^(?:([^,]*),?){20}', 1) productCount6
from temp_recommendation; I am getting this exception: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.mapreduce.v2.util.MRApps.addLog4jSystemProperties(Lorg/apache/hadoop/mapred/Task;Ljava/util/List;Lorg/apache/hadoop/conf/Configuration;)V
There are no logs generated and this is a pseudo distributed machine. Is this method wrong for handling multi delimiters or is there any other other better way? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
04-13-2017
12:24 AM
I am doing Sort merge join using tez examples jar using Tez 0.7.1. The sample of two files are:- ISBN;"Book-Title";"Book-Author";"Year-Of-Publication";"Publisher";"Image-URL-S";"Image-URL-M";"Image-URL-L"
0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
User-ID;"ISBN";"Book-Rating"
276725;"034545104X";"0" First one has 300 thousand and second one has around 1 million records and the common attribute is ISBN of a book. The DAG is getting completed successfully but there is no output. Even the logs look fine. My understanding of SortMergeJoin is that it sorts both datasets on the join attribute and then looks for qualifying records by merging the two datasets. The sorting step groups all tuples with the same value in the join column together and thus makes it easy to identify partitions or groups of tuples with the same value in the join column. I am referring this link from Tez examples. Just wanted to confirm that how is it deciding the join attribute which in this case should be ISBN. PLease help.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez
12-13-2016
05:07 AM
@Yukti Agrawal , the answer is no. But if you have HUE installed in your cluster, you should see some scripts. Check under this folder /usr/lib/hue/apps/oozie/examples. Or do simply run locate command to search it. I would recommend to create a simple script if you just want to test pig. Store a sample csv in HDFS and load that file in Pig: A = LOAD '/the/path/in/HDFS/sample.csv' USING PigStorage(',') AS (driverId:int, truckId:int, driverName:chararray);
some_columns = FOREACH A GENERATE driverId,driverName;
STORE some_columns into 'output/some_columns' using PigStorage(',');
... View more
12-12-2016
05:31 PM
Please help guys, I am kind of stuck here.
... View more
12-12-2016
04:57 AM
@Ahmad Hassan, Please accept the best answer to close the thread.
... View more
12-12-2016
03:45 AM
check your virtual box, is it showing if your VM is running. This usually happens when you do not have enough RAM. I would suggest you to get the machine which has at least 12 GB of RAM so that you can allocate 8GB to the Sandbox. Or download the previous version of Sandbox from here. Go to Hortonworks Sandbox in the Cloud section and click on Expand next to Sandbox archive. Download HDP 2.4 from there.
... View more
12-12-2016
02:57 AM
@Ahmad Hassan, please try to use comment section when you are not answering. Thank you
... View more
12-12-2016
02:57 AM
@Ahmad Hassan , this Sandbox is based on docker. So there is a VM and inside that VM, there is a docker container where all HDP components are installed and running. You might have login to the VM using either port 2122 or 22. Exit the shell and then do the following from your local machine terminal to enter the docker container(use port 2222😞 ssh root@127.0.0.1 -p2222 This will ask for the password which is hadoop and then it will prompt you to change the password. Then try running hive command.
... View more