Member since
02-18-2016
22
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2047 | 05-22-2016 10:21 AM |
01-02-2017
06:38 PM
Hello! I am test driving one Java written UDF in HDP sandbox 2.3_1 (No Tez enabled). I use simple "system.out.prinln("phase one");" to locate how my code works . Java code itself is "ok", but some of my logic in code is not working and producing zero output.. What is the -simpliest- way to track down in what part of code system is currently running in java Written UDF ? That System.out.println .. "sometimes" work fine, sometimes like now it doesn't give nothing even If I put one println inside while loop where Bag is processed. Maybe its not even reaching my println line .. ? Log4j -something is one way, but haven't seen any simple examples.. Thank you in advance!
... View more
Labels:
- Labels:
-
Apache Pig
06-17-2016
07:21 PM
Thank you! This was exactly what I Wanted. What was new to me was this
sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; So Pig can order groups in that way.
... View more
06-16-2016
02:33 PM
I have trouble keeping rows in order. I have data like this: cata,productx, sales,total_sales_of_category food bread 112USD,1890USD food breadX 98USD, 1890USD Oil MotorOil 786USD,7899USD Oil MotorOilY 678USD,11331USD Schema is: chararray,chararray,long,long .. Sorry for the lame example, but there are four colums. I can group them by category (cata) and order them by sales inside bag, BUT if I would to also order them by total_sales_of_category how am I supposed to do it .. ? Ordering inside bag works fine: grp = group ordered by $0;
top20 = foreach grp {
sorted = order ordered by $2 desc;
top = limit sorted 20;
generate group,FLATTEN(top);
};
But after this Total_sales_of_category is not in order (Of course its not ..) but if I would like to get that also in order (total_sales_of_category) How can it be done ? Simple using x = order top 20 by $4 desc, will order rows but i will loose order of sales .. Any advice would be great ..
... View more
Labels:
- Labels:
-
Apache Pig
05-22-2016
10:21 AM
1 Kudo
I found solution: Problem was about versions: I was using Tez 0.7.0 from Maven central, and hadoop 2.7.1 .. When I used 0.7.1 version from Tez this STORE -problem is gone. So problem solved.
... View more
05-20-2016
08:25 PM
Hello! Thanks for the quick reply. Here comes log file for this job: http://pastebin.ca/3605448 Don't mind about the lines after "starting make visualisation" .. They fail for reason in this case. I am stunned, because that script stores one time temp -file, but when it comes to second time .. unable to store .. I hope you can help me with this one. @Pradeep Bhadani
... View more
05-20-2016
02:00 PM
1 Kudo
Hello, I have Tez 0.7.0 and HDP 2.4.0.0.169. I use pig latin quite a lot in Mapreduce -mode. Today I tried Tez mode (From my java code .. ) everything seems to work fine, but in same pig latin script in Mapreduce stores alias Ok, but in tez -mode I get "ERROR 1002: Unable to store alias secondorder". I checked the script and nothing wrong there. Here is my script: (When I check path tebs/results/Ravintolamyynnintop20 from HDFS, its there stored ok .. But still I get this error Unable to store alias secondorder and show stops .. ) what could be wrong ? Some version conflict with my Java code vs. HDP ? splittedII = load 'tebs/data/currentmon*.*' using PigStorage(';') as (id:chararray,fu:chararray,fa:chararray,myynti:chararray);
splittedI = FILTER splittedII BY NOT($3 MATCHES '.*Ko.*') AND NOT($2 MATCHES '.*Yht.*') AND NOT($1 MATCHES '.*Yht.*');
onlyrestaurants = FILTER splittedI BY ($1 MATCHES '.*23021 RUOKA.*') OR ($1 MATCHES '.*23022 SUOLAINEN.*') OR ($1 MATCHES '.*23023 MAKEA.*') OR ($1 MATCHES '.*23024 PIKARUOKA.$
partly = foreach onlyrestaurants generate $1,$2,REPLACE(myynti, ',','.');
store partly into 'teb/tempten';
partly = load 'teb/tempten' using PigStorage('\t') as (id:chararray,fu:chararray,myynti:double);
grpded = group partly by ($0,$1);
summed = foreach grpded generate FLATTEN(group) AS (id,fu),SUM(partly.$2);
ordered = order summed by $2 DESC;
grp = group ordered by $0;
top10 = foreach grp {
sorted = order ordered by $2 desc;
top = limit sorted 20;
total = SUM(ordered.$2);
generate group,FLATTEN(top),FLATTEN(total);
};
secondorder = ORDER top10 by $4 DESC;
store secondorder into 'tebs/results/Ravintolamyynnintop20';
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache Tez
04-27-2016
09:23 AM
Thanks for the reply. My problems might be dns related .. My ubuntu has DNS and it worked like charm, but after I updated my Asus router somehow it stopped working .. So, I am back to good old hosts - file and doing reinstall with this setup. I let you know if I face issues.
... View more
04-27-2016
07:42 AM
Hello! I have a question: Is it only needed to to copy SSH public key, generated in ambari server (Root), to all hosts which are going to be part of the cluster and make sure you can ssh from ambari server to hosts without PW ? OR do I also need to be able to SSH between hosts (Between slave1 --> slave2) or is it enough if ambari server is able to login to hosts and installation process makes sure that also slaves, if needed, are able to do login ? I am asking this because I am having a problem with HDP 2.3 installation: All components install fine, but installation finish with orange bar .. and thats because Ambari is not able to start components after install (some of them yes, but not all, like timeline app server, yarn .. are not started). My guess is that its something to do with connections between nodes ,and I am not talking FW issues .. I have Ubuntu 14.04 with dos (which works .. I can ssh using hostnames)..
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-18-2016
11:18 AM
1 Kudo
Thanks for answers. I know, Sandbox is just for testing script, but testing scripts needs some data, and in my opinion 180MB of data is still "sample" which should work fine with sandbox, maybe I am wrong.. but I guess the problem is with some files gets corrupted (When Virtualbox is shut down / crash), surely Pig latin, even it "eats everything" needs some own storage to save information about data, and that place, wherever it is, gets somehow corrupted and again we are talking about corrupted files or lack of space etc. Whatever. Production cluster is whole different thing.
... View more