Member since
02-18-2016
22
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1085 | 05-22-2016 10:21 AM |
06-11-2019
06:44 AM
@Shu Thanks for your answer ! I tought that execute ssh - approach. But what about if I install ni-fi in that GPU machine and make remote process group there, and let that handle output forwarding back to that nifi instance which called that remote process group and get better statistics also from input/output ? Does this sound too heavy solution, or more like NiFi way ? .. I can push GPU machine generated output text to SQL server and take it from there .. (SSH execution) Just would like to get some control what is happening in that GPU machine during execution, and that SSH way kind of looses track what is happening (vs. Remote process group .. ) or am I misunderstood ? Thanks!
... View more
06-10-2019
04:25 PM
Hello! I am here to ask your opinion about following subject: I am planning to use Apache Ni-Fi to orchestrate my dataflow. I have used it, so its familiar somehow. Now, if I have a process, which needs GPU and that GPU is located in different linux machine, what would be good way to give commands to that GPU machine to start processing files from dataflow ? My first processor pulls files from FTP. Next one does some normalization to them and this third processor is that GPU machine phase. Those files end up in that GPU machines and nothing but text is outputted finally. That outputted text should be push to SQL Question: How that GPU machine should be activated to execute files from Ni-Fi flow .. ? Thanks for your answers.
... View more
Labels:
- Labels:
-
Apache NiFi
01-02-2017
06:38 PM
Hello! I am test driving one Java written UDF in HDP sandbox 2.3_1 (No Tez enabled). I use simple "system.out.prinln("phase one");" to locate how my code works . Java code itself is "ok", but some of my logic in code is not working and producing zero output.. What is the -simpliest- way to track down in what part of code system is currently running in java Written UDF ? That System.out.println .. "sometimes" work fine, sometimes like now it doesn't give nothing even If I put one println inside while loop where Bag is processed. Maybe its not even reaching my println line .. ? Log4j -something is one way, but haven't seen any simple examples.. Thank you in advance!
... View more
Labels:
- Labels:
-
Apache Pig
06-17-2016
07:21 PM
Thank you! This was exactly what I Wanted. What was new to me was this
sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; So Pig can order groups in that way.
... View more
06-16-2016
02:33 PM
I have trouble keeping rows in order. I have data like this: cata,productx, sales,total_sales_of_category food bread 112USD,1890USD food breadX 98USD, 1890USD Oil MotorOil 786USD,7899USD Oil MotorOilY 678USD,11331USD Schema is: chararray,chararray,long,long .. Sorry for the lame example, but there are four colums. I can group them by category (cata) and order them by sales inside bag, BUT if I would to also order them by total_sales_of_category how am I supposed to do it .. ? Ordering inside bag works fine: grp = group ordered by $0;
top20 = foreach grp {
sorted = order ordered by $2 desc;
top = limit sorted 20;
generate group,FLATTEN(top);
};
But after this Total_sales_of_category is not in order (Of course its not ..) but if I would like to get that also in order (total_sales_of_category) How can it be done ? Simple using x = order top 20 by $4 desc, will order rows but i will loose order of sales .. Any advice would be great ..
... View more
Labels:
- Labels:
-
Apache Pig
05-22-2016
10:21 AM
1 Kudo
I found solution: Problem was about versions: I was using Tez 0.7.0 from Maven central, and hadoop 2.7.1 .. When I used 0.7.1 version from Tez this STORE -problem is gone. So problem solved.
... View more
05-20-2016
08:25 PM
Hello! Thanks for the quick reply. Here comes log file for this job: http://pastebin.ca/3605448 Don't mind about the lines after "starting make visualisation" .. They fail for reason in this case. I am stunned, because that script stores one time temp -file, but when it comes to second time .. unable to store .. I hope you can help me with this one. @Pradeep Bhadani
... View more
05-20-2016
02:00 PM
1 Kudo
Hello, I have Tez 0.7.0 and HDP 2.4.0.0.169. I use pig latin quite a lot in Mapreduce -mode. Today I tried Tez mode (From my java code .. ) everything seems to work fine, but in same pig latin script in Mapreduce stores alias Ok, but in tez -mode I get "ERROR 1002: Unable to store alias secondorder". I checked the script and nothing wrong there. Here is my script: (When I check path tebs/results/Ravintolamyynnintop20 from HDFS, its there stored ok .. But still I get this error Unable to store alias secondorder and show stops .. ) what could be wrong ? Some version conflict with my Java code vs. HDP ? splittedII = load 'tebs/data/currentmon*.*' using PigStorage(';') as (id:chararray,fu:chararray,fa:chararray,myynti:chararray);
splittedI = FILTER splittedII BY NOT($3 MATCHES '.*Ko.*') AND NOT($2 MATCHES '.*Yht.*') AND NOT($1 MATCHES '.*Yht.*');
onlyrestaurants = FILTER splittedI BY ($1 MATCHES '.*23021 RUOKA.*') OR ($1 MATCHES '.*23022 SUOLAINEN.*') OR ($1 MATCHES '.*23023 MAKEA.*') OR ($1 MATCHES '.*23024 PIKARUOKA.$
partly = foreach onlyrestaurants generate $1,$2,REPLACE(myynti, ',','.');
store partly into 'teb/tempten';
partly = load 'teb/tempten' using PigStorage('\t') as (id:chararray,fu:chararray,myynti:double);
grpded = group partly by ($0,$1);
summed = foreach grpded generate FLATTEN(group) AS (id,fu),SUM(partly.$2);
ordered = order summed by $2 DESC;
grp = group ordered by $0;
top10 = foreach grp {
sorted = order ordered by $2 desc;
top = limit sorted 20;
total = SUM(ordered.$2);
generate group,FLATTEN(top),FLATTEN(total);
};
secondorder = ORDER top10 by $4 DESC;
store secondorder into 'tebs/results/Ravintolamyynnintop20';
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache Tez
04-27-2016
09:23 AM
Thanks for the reply. My problems might be dns related .. My ubuntu has DNS and it worked like charm, but after I updated my Asus router somehow it stopped working .. So, I am back to good old hosts - file and doing reinstall with this setup. I let you know if I face issues.
... View more
04-27-2016
07:42 AM
Hello! I have a question: Is it only needed to to copy SSH public key, generated in ambari server (Root), to all hosts which are going to be part of the cluster and make sure you can ssh from ambari server to hosts without PW ? OR do I also need to be able to SSH between hosts (Between slave1 --> slave2) or is it enough if ambari server is able to login to hosts and installation process makes sure that also slaves, if needed, are able to do login ? I am asking this because I am having a problem with HDP 2.3 installation: All components install fine, but installation finish with orange bar .. and thats because Ambari is not able to start components after install (some of them yes, but not all, like timeline app server, yarn .. are not started). My guess is that its something to do with connections between nodes ,and I am not talking FW issues .. I have Ubuntu 14.04 with dos (which works .. I can ssh using hostnames)..
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-18-2016
11:18 AM
1 Kudo
Thanks for answers. I know, Sandbox is just for testing script, but testing scripts needs some data, and in my opinion 180MB of data is still "sample" which should work fine with sandbox, maybe I am wrong.. but I guess the problem is with some files gets corrupted (When Virtualbox is shut down / crash), surely Pig latin, even it "eats everything" needs some own storage to save information about data, and that place, wherever it is, gets somehow corrupted and again we are talking about corrupted files or lack of space etc. Whatever. Production cluster is whole different thing.
... View more
02-18-2016
09:16 AM
2 Kudos
I noticed following problem with HDP sandbox 2.3.1: My hardware is MAC BOOK PRO, 8gt memory, 256Gt ssd, OSX El Capitan. When I have few hundred Pig latin hadoop jobs (Tez) per day (I am testing script with 180mb data sample) I noticed that pig latin looses column / or columns. This happens after week or two of active testing with same HDP Sandbox installation. I checked everything, data is correct, position of field is correct, but I get empty results if I try to access one column. Column is type chararray and even if I take "all columns" without filtering etc. even then the whole column is gone. When I reinstall HDP and try the same Pig latin script, without changing everything, (From Hue pig latin editor) everything is fine and the column is there like It should be. So question is: Is there some sort of SQL or something what pig uses to store schemas gets filled up and this causes loosing some of the information when you "heavily use" sandbox environment .. ?
... View more
- Tags:
- Data Processing
- Pig
Labels:
- Labels:
-
Apache Pig