Support Questions

Find answers, ask questions, and share your expertise

pig script status running but always remain at zero percent ?

s i am installed hadoop using hdp on ec2 amazon cloud.

architecture is :

1 node on which ambari server is installed

2 master nodes

3 data nodes

So i have total 6 machines in the cluster.

Now when i open the pig view in ambari and run my hello world tutorial script

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

Its status changes to accepted and then changes to running but always remain running for hours

Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.

1904-pig.png

1905-pig1.png

here i upload the screenshot of installed components in my cluster

1907-main-board.png

1 ACCEPTED SOLUTION

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

View solution in original post

20 REPLIES 20

Mentor

@Rupinder Singh

1.Restarting YARN should solve it.

2. This could be a java heap size problem.

or

To use both the options, use below command:

pig -useHCatalog -x tez -f script.pig

To use only Hcat:

pig -useHCatalog -f script.pig

To use only tez:

pig -x tez -f script.pig

Thanx @Geoffrey Shelton Okot for replying . I restarted the yarn but problem still persists. if it is java heap size problem then how to solve it .

thnx in advance 🙂

Could be a lot of things: Anything from yarn queue misconfigured to job just takes a long time.

Have a look into the resourcemanager UI (:8088) and get the logs for the job that is being kicked off. You can see if the job is kicked off if containers have been allocated and are running and if he is reading/writing data. ( Don't use tez in the beginning I would figure out first what is wrong with the job or yarn. )

So

-> go to Resourcemanager UI

-> Click on the job that is kicked off ( you can see the job name in pig or just look for one running and pig )

-> Click on Application Master

-> You should see all containers started for it. IF there is none you might have a yarn queue problem

-> If mappers are started click on Maps ( lower right ) and check if they are running and perhaps in the logs to see what is going on.

sir @Benjamin Leonhardi i found one more issue as i am unable to acess resource manager UI from my ambari

1936-resource.png

When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

1937-master-node1.pngMaster node has.....

1938-master-node2.pngthird node has :......

1939-master-node3.png

Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

@Rupinder Singh You have to setup local /etc/hosts in your machine that will resolve those hostname to public IP of servers

thnx sir @Neeraj Sabharwal ok you mean i have to write like this

172.31.1.137 ip-172-31-1-137.ap-northeast-1.compute.internal ambariserver

on my windows client in "C:\Windows\System32\drivers\etc\hosts"

Mentor

@Rupinder Singh I would suggest to run service checks on all components including pig and tez. Once you validate everything works, I would open a pig grunt shell and try running each pig statement one at a time and look for errors. Additionally, you can dump after each executed command in pig to see where you fail. It can be many things and really hard to tell from what you've told us so far.

thnx @Artem Ervits for replying here is my script ;

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

dump a;

when i execute the first statement and then execute dump a; it stops at zero percent and never move forward;

Mentor

First fix RM issues then go into Hive and execute show tables

I want to make sure table exists. But I think your issues is not pig script but your environment, @Rupinder Singh

yes @Artem Ervits sir i already created the required tables for script but still i got same problem.

Mentor

@Rupinder Singh go to /var/log/hadoop/yarn and look for errors in resource manager log. You need to fix RM before anything

@Rupinder Singh

See this

https://community.hortonworks.com/questions/15098/how-to-process-data-with-apache-pig-tutorial-slow....

You have to check if there is any other job holding on to resources.

sir @Neeraj Sabharwal i found one more issue as i am unable to acess resource manager UI from my ambari

1936-resource.png

When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

1937-master-node1.pngMaster node has.....

1938-master-node2.pngthird node has :......

1939-master-node3.png

Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

Rising Star

If you are using EC2, then make sure your OS has the properly defined hostnames. You can updated it using hostnamectl or set it on the sysconfig/network files based on your OS version.

Then to fix your hostnames in your server/agent, follow: https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/ambari-chap7a.html

Restart Ambari Server and Agent and you should be good !

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

Mentor

@Rupinder Singh pick the best answer, I see people still trying to help you solve this :).

Mentor

@Rupinder Singh please choose the best answer that helped you as we need to close out this thread.

Explorer

@Rupinder Singh

Can you please elaborate the exact solution to this problem ? I am facing the same issue..

@Artem Ervits @grajagopal @Geoffrey Shelton Okot

Hello,

I'm facing the same issue but by following the tutorial mentioned in:

https://hortonworks.com/tutorial/hadoop-tutorial-getting-started-with-hdp/section/4/.

Once i execute my pig script, it is stuck in running status as mentioned in status.png.

From RM UI, my application is also stuck in Running status as shown in rm-application.png and i attached the launched job in MapReduce in mr-job.png.

From pig view log, i got hive-log.png.

How can i resolve my issue? I'll be really grateful if you could help me.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.