Support Questions

Find answers, ask questions, and share your expertise

PIG Script Taking Too Much Time To Run

Explorer

@Aditya Sirna

@Jay Kumar SenSharma

I'm running a very simple PIG Script which is shown as follows :

ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);
metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage('|') 
 AS (movieID:int, movieTitle:chararray, releaseDate:chararray, videoRelease:chararray, imdbLink:chararray);
    
nameLookup = FOREACH metadata GENERATE movieID, movieTitle, ToUnixTime(ToDate(releaseDate, 'dd-MMM-yyyy')) AS releaseTime;

ratingsByMovie = GROUP ratings BY movieID;

avgRatings = FOREACH ratingsByMovie GENERATE group AS movieID, AVG(ratings.rating) AS avgRating;

fiveStarMovies = FILTER avgRatings BY avgRating > 4.0;

fiveStarsWithData = JOIN fiveStarMovies BY movieID, nameLookup BY movieID;

oldestFiveStarMovies = ORDER fiveStarsWithData BY nameLookup::releaseTime;

DUMP oldestFiveStarMovies;

But after hitting the execute button in PIG View, it has been running since the last 1 hour. I am unable to see any progress. I have attached the screenshot as well.

The data that I am using consists of around 100,000 ratings from around 1000 users. Does this happen by default ? Is it natural for PIG to take too much time ?

Is there any error here ? I am pretty sure that there is no error in the code .. but still PIG is taking too much time to execute the script.

Can someone please throw some light on this and guide me ?

48384-pig-script-taking-too-much-time.png

5 REPLIES 5

@Amogh Suman,

Based on the below code, releaseDate is not declared. Did you want to put 'videoRelease' instead of 'releaseDate'

metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage('|') 	AS (movieID:int, movieTitle:chararray, videoRelease:chararray, imdbLink:chararray);
nameLookup = FOREACH metadata GENERATE movieID, movieTitle,ToUnixTime(ToDate(releaseDate,'dd-MMM-yyyy')) AS releaseTime;

Thanks,

Aditya

Explorer

@Aditya Sirna

Yes, you are right. I have edited the question, but still same thing is happening. After hitting the execution command, it is stuck.

@Amogh Suman,

Also modify this line

fiveStarMovies = FILTER avgRatings BY avgrating >4.0; to

fiveStarMovies = FILTER avgRatings BY avgRating >4.0;

The slowness could be because of resources in Yarn. Check if any YARN applications are already running. You can see it in RM UI. Go to Yarn -> QuickLinks -> ResoruceManager UI.

See if your application is in Accepted/Running state. Also observer the memory taken, Vcores used etc. If your data is small then it should be run within a minute.

Thanks,

Aditya

Explorer

@Aditya Sirna

I edited the code and the question as well. But suddenly the ambari server crashed I think. In the console of sanbox, when I typed 'ambari-server restart', I got the following error :

ambari-server restart                                                                                                          
Using python  /usr/bin/python                                                                                                                        
Restarting ambari-server                                                                                                                             
Ambari Server is not running                                                                                                                         
Ambari Server running with administrator privileges.                                                                                                 
Organizing resource files at /var/lib/ambari-server/resources...                                                                                     
Ambari database consistency check started...                                                                                                         
Server PID at: /var/run/ambari-server/ambari-server.pid                                                                                              
Server out at: /var/log/ambari-server/ambari-server.out                                                                                              
Server log at: /var/log/ambari-server/ambari-server.log                                                                                              
Waiting for server start.........Unable to determine server PID. Retrying...                                                                         
......Unable to determine server PID. Retrying...                                                                                                    
......Unable to determine server PID. Retrying...                                                                                                    
ERROR: Exiting with exit code -1.                                                                                                                    
REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information.

I have posted about this error in a new question : https://community.hortonworks.com/questions/158800/ambari-server-restart-ambari-server-java-process-...

Explorer

Does this issue has anything to do with the history server ?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.