Support Questions

Find answers, ask questions, and share your expertise

pig script status running but always remain at zero percent ?

avatar
Contributor

s i am installed hadoop using hdp on ec2 amazon cloud.

architecture is :

1 node on which ambari server is installed

2 master nodes

3 data nodes

So i have total 6 machines in the cluster.

Now when i open the pig view in ambari and run my hello world tutorial script

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

Its status changes to accepted and then changes to running but always remain running for hours

Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.

1904-pig.png

1905-pig1.png

here i upload the screenshot of installed components in my cluster

1907-main-board.png

1 ACCEPTED SOLUTION

avatar
Contributor

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

View solution in original post

20 REPLIES 20

avatar
Master Mentor

First fix RM issues then go into Hive and execute show tables

I want to make sure table exists. But I think your issues is not pig script but your environment, @Rupinder Singh

avatar
Contributor

yes @Artem Ervits sir i already created the required tables for script but still i got same problem.

avatar
Master Mentor

@Rupinder Singh go to /var/log/hadoop/yarn and look for errors in resource manager log. You need to fix RM before anything

avatar
Master Mentor
@Rupinder Singh

See this

https://community.hortonworks.com/questions/15098/how-to-process-data-with-apache-pig-tutorial-slow....

You have to check if there is any other job holding on to resources.

avatar
Contributor

sir @Neeraj Sabharwal i found one more issue as i am unable to acess resource manager UI from my ambari

1936-resource.png

When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

1937-master-node1.pngMaster node has.....

1938-master-node2.pngthird node has :......

1939-master-node3.png

Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

avatar
Expert Contributor

If you are using EC2, then make sure your OS has the properly defined hostnames. You can updated it using hostnamectl or set it on the sysconfig/network files based on your OS version.

Then to fix your hostnames in your server/agent, follow: https://ambari.apache.org/1.2.3/installing-hadoop-using-ambari/content/ambari-chap7a.html

Restart Ambari Server and Agent and you should be good !

avatar
Contributor

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

avatar
Master Mentor

@Rupinder Singh pick the best answer, I see people still trying to help you solve this :).

avatar
Master Mentor

@Rupinder Singh please choose the best answer that helped you as we need to close out this thread.

avatar
Explorer

@Rupinder Singh

Can you please elaborate the exact solution to this problem ? I am facing the same issue..