Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

pig script status running but always remain at zero percent ?


s i am installed hadoop using hdp on ec2 amazon cloud.

architecture is :

1 node on which ambari server is installed

2 master nodes

3 data nodes

So i have total 6 machines in the cluster.

Now when i open the pig view in ambari and run my hello world tutorial script

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

Its status changes to accepted and then changes to running but always remain running for hours

Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.



here i upload the screenshot of installed components in my cluster




This problem has been solved!

Want to get a detailed solution you have to login/registered on the community


Master Mentor

@Rupinder Singh

1.Restarting YARN should solve it.

2. This could be a java heap size problem.


To use both the options, use below command:

pig -useHCatalog -x tez -f script.pig

To use only Hcat:

pig -useHCatalog -f script.pig

To use only tez:

pig -x tez -f script.pig


Thanx @Geoffrey Shelton Okot for replying . I restarted the yarn but problem still persists. if it is java heap size problem then how to solve it .

thnx in advance 🙂

Master Guru

Could be a lot of things: Anything from yarn queue misconfigured to job just takes a long time.

Have a look into the resourcemanager UI (:8088) and get the logs for the job that is being kicked off. You can see if the job is kicked off if containers have been allocated and are running and if he is reading/writing data. ( Don't use tez in the beginning I would figure out first what is wrong with the job or yarn. )


-> go to Resourcemanager UI

-> Click on the job that is kicked off ( you can see the job name in pig or just look for one running and pig )

-> Click on Application Master

-> You should see all containers started for it. IF there is none you might have a yarn queue problem

-> If mappers are started click on Maps ( lower right ) and check if they are running and perhaps in the logs to see what is going on.


sir @Benjamin Leonhardi i found one more issue as i am unable to acess resource manager UI from my ambari


When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

1937-master-node1.pngMaster node has.....

1938-master-node2.pngthird node has :......


Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

Master Mentor

@Rupinder Singh You have to setup local /etc/hosts in your machine that will resolve those hostname to public IP of servers


thnx sir @Neeraj Sabharwal ok you mean i have to write like this ip-172-31-1-137.ap-northeast-1.compute.internal ambariserver

on my windows client in "C:\Windows\System32\drivers\etc\hosts"

Master Mentor

Master Mentor

@Rupinder Singh I would suggest to run service checks on all components including pig and tez. Once you validate everything works, I would open a pig grunt shell and try running each pig statement one at a time and look for errors. Additionally, you can dump after each executed command in pig to see where you fail. It can be many things and really hard to tell from what you've told us so far.


thnx @Artem Ervits for replying here is my script ;

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

dump a;

when i execute the first statement and then execute dump a; it stops at zero percent and never move forward;