Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pig script status running but always remain at zero percent ?

avatar
Contributor

s i am installed hadoop using hdp on ec2 amazon cloud.

architecture is :

1 node on which ambari server is installed

2 master nodes

3 data nodes

So i have total 6 machines in the cluster.

Now when i open the pig view in ambari and run my hello world tutorial script

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

Its status changes to accepted and then changes to running but always remain running for hours

Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.

1904-pig.png

1905-pig1.png

here i upload the screenshot of installed components in my cluster

1907-main-board.png

1 ACCEPTED SOLUTION

avatar
Contributor

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

View solution in original post

20 REPLIES 20

avatar
Master Mentor

@Rupinder Singh

1.Restarting YARN should solve it.

2. This could be a java heap size problem.

or

To use both the options, use below command:

pig -useHCatalog -x tez -f script.pig

To use only Hcat:

pig -useHCatalog -f script.pig

To use only tez:

pig -x tez -f script.pig

avatar
Contributor

Thanx @Geoffrey Shelton Okot for replying . I restarted the yarn but problem still persists. if it is java heap size problem then how to solve it .

thnx in advance 🙂

avatar
Master Guru

Could be a lot of things: Anything from yarn queue misconfigured to job just takes a long time.

Have a look into the resourcemanager UI (:8088) and get the logs for the job that is being kicked off. You can see if the job is kicked off if containers have been allocated and are running and if he is reading/writing data. ( Don't use tez in the beginning I would figure out first what is wrong with the job or yarn. )

So

-> go to Resourcemanager UI

-> Click on the job that is kicked off ( you can see the job name in pig or just look for one running and pig )

-> Click on Application Master

-> You should see all containers started for it. IF there is none you might have a yarn queue problem

-> If mappers are started click on Maps ( lower right ) and check if they are running and perhaps in the logs to see what is going on.

avatar
Contributor

sir @Benjamin Leonhardi i found one more issue as i am unable to acess resource manager UI from my ambari

1936-resource.png

When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

1937-master-node1.pngMaster node has.....

1938-master-node2.pngthird node has :......

1939-master-node3.png

Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

avatar
Master Mentor

@Rupinder Singh You have to setup local /etc/hosts in your machine that will resolve those hostname to public IP of servers

avatar
Contributor

thnx sir @Neeraj Sabharwal ok you mean i have to write like this

172.31.1.137 ip-172-31-1-137.ap-northeast-1.compute.internal ambariserver

on my windows client in "C:\Windows\System32\drivers\etc\hosts"

avatar
Master Mentor

avatar
Master Mentor

@Rupinder Singh I would suggest to run service checks on all components including pig and tez. Once you validate everything works, I would open a pig grunt shell and try running each pig statement one at a time and look for errors. Additionally, you can dump after each executed command in pig to see where you fail. It can be many things and really hard to tell from what you've told us so far.

avatar
Contributor

thnx @Artem Ervits for replying here is my script ;

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

dump a;

when i execute the first statement and then execute dump a; it stops at zero percent and never move forward;