About petri_koski

petri_koski · ‎01-02-2017

Hello! I am test driving one Java written UDF in HDP sandbox 2.3_1 (No Tez enabled). I use simple "system.out.prinln("phase one");" to locate how my code works . Java code itself is "ok", but some of my logic in code is not working and producing zero output.. What is the -simpliest- way to track down in what part of code system is currently running in java Written UDF ? That System.out.println .. "sometimes" work fine, sometimes like now it doesn't give nothing even If I put one println inside while loop where Bag is processed. Maybe its not even reaching my println line .. ? Log4j -something is one way, but haven't seen any simple examples.. Thank you in advance!

petri_koski · ‎06-17-2016

Thank you! This was exactly what I Wanted. What was new to me was this sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; So Pig can order groups in that way.

petri_koski · ‎06-16-2016

I have trouble keeping rows in order. I have data like this: cata,productx, sales,total_sales_of_category food bread 112USD,1890USD food breadX 98USD, 1890USD Oil MotorOil 786USD,7899USD Oil MotorOilY 678USD,11331USD Schema is: chararray,chararray,long,long .. Sorry for the lame example, but there are four colums. I can group them by category (cata) and order them by sales inside bag, BUT if I would to also order them by total_sales_of_category how am I supposed to do it .. ? Ordering inside bag works fine: grp = group ordered by $0; top20 = foreach grp { sorted = order ordered by $2 desc; top = limit sorted 20; generate group,FLATTEN(top); }; But after this Total_sales_of_category is not in order (Of course its not ..) but if I would like to get that also in order (total_sales_of_category) How can it be done ? Simple using x = order top 20 by $4 desc, will order rows but i will loose order of sales .. Any advice would be great ..

petri_koski · ‎05-22-2016

I found solution: Problem was about versions: I was using Tez 0.7.0 from Maven central, and hadoop 2.7.1 .. When I used 0.7.1 version from Tez this STORE -problem is gone. So problem solved.

petri_koski · ‎05-20-2016

Hello! Thanks for the quick reply. Here comes log file for this job: http://pastebin.ca/3605448 Don't mind about the lines after "starting make visualisation" .. They fail for reason in this case. I am stunned, because that script stores one time temp -file, but when it comes to second time .. unable to store .. I hope you can help me with this one. @Pradeep Bhadani

petri_koski · ‎05-20-2016

Hello, I have Tez 0.7.0 and HDP 2.4.0.0.169. I use pig latin quite a lot in Mapreduce -mode. Today I tried Tez mode (From my java code .. ) everything seems to work fine, but in same pig latin script in Mapreduce stores alias Ok, but in tez -mode I get "ERROR 1002: Unable to store alias secondorder". I checked the script and nothing wrong there. Here is my script: (When I check path tebs/results/Ravintolamyynnintop20 from HDFS, its there stored ok .. But still I get this error Unable to store alias secondorder and show stops .. ) what could be wrong ? Some version conflict with my Java code vs. HDP ? splittedII = load 'tebs/data/currentmon*.*' using PigStorage(';') as (id:chararray,fu:chararray,fa:chararray,myynti:chararray); splittedI = FILTER splittedII BY NOT($3 MATCHES '.*Ko.*') AND NOT($2 MATCHES '.*Yht.*') AND NOT($1 MATCHES '.*Yht.*'); onlyrestaurants = FILTER splittedI BY ($1 MATCHES '.*23021 RUOKA.*') OR ($1 MATCHES '.*23022 SUOLAINEN.*') OR ($1 MATCHES '.*23023 MAKEA.*') OR ($1 MATCHES '.*23024 PIKARUOKA.$ partly = foreach onlyrestaurants generate $1,$2,REPLACE(myynti, ',','.'); store partly into 'teb/tempten'; partly = load 'teb/tempten' using PigStorage('\t') as (id:chararray,fu:chararray,myynti:double); grpded = group partly by ($0,$1); summed = foreach grpded generate FLATTEN(group) AS (id,fu),SUM(partly.$2); ordered = order summed by $2 DESC; grp = group ordered by $0; top10 = foreach grp { sorted = order ordered by $2 desc; top = limit sorted 20; total = SUM(ordered.$2); generate group,FLATTEN(top),FLATTEN(total); }; secondorder = ORDER top10 by $4 DESC; store secondorder into 'tebs/results/Ravintolamyynnintop20';

petri_koski · ‎04-27-2016

And it was DNS related... So, now its working 🙂

petri_koski · ‎04-27-2016

Thanks for the reply. My problems might be dns related .. My ubuntu has DNS and it worked like charm, but after I updated my Asus router somehow it stopped working .. So, I am back to good old hosts - file and doing reinstall with this setup. I let you know if I face issues.

petri_koski · ‎04-27-2016

Hello! I have a question: Is it only needed to to copy SSH public key, generated in ambari server (Root), to all hosts which are going to be part of the cluster and make sure you can ssh from ambari server to hosts without PW ? OR do I also need to be able to SSH between hosts (Between slave1 --> slave2) or is it enough if ambari server is able to login to hosts and installation process makes sure that also slaves, if needed, are able to do login ? I am asking this because I am having a problem with HDP 2.3 installation: All components install fine, but installation finish with orange bar .. and thats because Ambari is not able to start components after install (some of them yes, but not all, like timeline app server, yarn .. are not started). My guess is that its something to do with connections between nodes ,and I am not talking FW issues .. I have Ubuntu 14.04 with dos (which works .. I can ssh using hostnames)..

petri_koski · ‎02-18-2016

Thanks for answers. I know, Sandbox is just for testing script, but testing scripts needs some data, and in my opinion 180MB of data is still "sample" which should work fine with sandbox, maybe I am wrong.. but I guess the problem is with some files gets corrupted (When Virtualbox is shut down / crash), surely Pig latin, even it "eats everything" needs some own storage to save information about data, and that place, wherever it is, gets somehow corrupted and again we are talking about corrupted files or lack of space etc. Whatever. Production cluster is whole different thing.

Online	Offline
Last Visited	‎10-15-2019 11:15 AM

Member Since	‎02-18-2016 09:06 AM
Last Visited	‎10-15-2019 11:15 AM
Posts	22
Kudos received	7

Cloudera Community

Re: Pig on Tez causes unable to store alias

HDP 2.3_1 Sandbox Pig Java UDF don't give any stdo...

Re: Pig latin / Order after group by

Pig latin / Order after group by

Re: Pig on Tez causes unable to store alias

Re: Pig on Tez causes unable to store alias

Pig on Tez causes unable to store alias

Re: SSH key question

Re: SSH key question

SSH key question

Re: HDP Sandbox 2.3.1 Problems with Pig