Member since
02-18-2016
22
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2021 | 05-22-2016 10:21 AM |
01-02-2017
10:42 PM
@petri koski, UDF (either for Hive or PIG) are running during map-reduce stage (doesn't matter whether it is M/R or TEZ execution engine). In other words, you are println during distributed computing. The code that prints your output is not under your execution shell (unless you are running in local mode). How to see your printed lines? There are some ways: - using job tracker UI - find your job and click on logs. One by one across all containers, until you will find it (or in each of them, if your code is applicable to each and every record of processed data). - using yarn get aggregated logs yarn logs -applicationId <aplpicationID>
... View more
06-17-2016
07:21 PM
Thank you! This was exactly what I Wanted. What was new to me was this
sortGrpByCatTotals = ORDER grpByCatTotals BY group DESC; So Pig can order groups in that way.
... View more
05-22-2016
10:21 AM
1 Kudo
I found solution: Problem was about versions: I was using Tez 0.7.0 from Maven central, and hadoop 2.7.1 .. When I used 0.7.1 version from Tez this STORE -problem is gone. So problem solved.
... View more
02-18-2016
11:18 AM
1 Kudo
Thanks for answers. I know, Sandbox is just for testing script, but testing scripts needs some data, and in my opinion 180MB of data is still "sample" which should work fine with sandbox, maybe I am wrong.. but I guess the problem is with some files gets corrupted (When Virtualbox is shut down / crash), surely Pig latin, even it "eats everything" needs some own storage to save information about data, and that place, wherever it is, gets somehow corrupted and again we are talking about corrupted files or lack of space etc. Whatever. Production cluster is whole different thing.
... View more