Support Questions

rupinderkaoni · ‎02-08-2016

s i am installed hadoop using hdp on ec2 amazon cloud.

architecture is :

1 node on which ambari server is installed

2 master nodes

3 data nodes

So i have total 6 machines in the cluster.

Now when i open the pig view in ambari and run my hello world tutorial script

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by event != 'normal';
c = foreach b generate driverid, event, (int) '1' as occurance;
d = group c by driverid;
e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;
g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();
h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;
store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

Its status changes to accepted and then changes to running but always remain running for hours

Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.

here i upload the screenshot of installed components in my cluster

rupinderkaoni · ‎02-11-2016

thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂

View solution in original post

Shelton · ‎02-08-2016

@Rupinder Singh

1.Restarting YARN should solve it.

2. This could be a java heap size problem.

or

To use both the options, use below command:

pig -useHCatalog -x tez -f script.pig

To use only Hcat:

pig -useHCatalog -f script.pig

To use only tez:

pig -x tez -f script.pig

rupinderkaoni · ‎02-08-2016

Thanx @Geoffrey Shelton Okot for replying . I restarted the yarn but problem still persists. if it is java heap size problem then how to solve it .

thnx in advance 🙂

bleonhardi · ‎02-08-2016

Could be a lot of things: Anything from yarn queue misconfigured to job just takes a long time.

Have a look into the resourcemanager UI (:8088) and get the logs for the job that is being kicked off. You can see if the job is kicked off if containers have been allocated and are running and if he is reading/writing data. ( Don't use tez in the beginning I would figure out first what is wrong with the job or yarn. )

So

-> go to Resourcemanager UI

-> Click on the job that is kicked off ( you can see the job name in pig or just look for one running and pig )

-> Click on Application Master

-> You should see all containers started for it. IF there is none you might have a yarn queue problem

-> If mappers are started click on Maps ( lower right ) and check if they are running and perhaps in the logs to see what is going on.

rupinderkaoni · ‎02-09-2016

sir @Benjamin Leonhardi i found one more issue as i am unable to acess resource manager UI from my ambari

When i click on the resource manager ui of my ambari this window opens and this message is shown.

Actually my ambari server is installed on 1 different node.

i have 2 master nodes

1 node data +master

2 data nodes

total =6 nodes

master node 1 have :-

Master node has.....

third node has :......

Other 2 nodes has only clients installed on it.

i also set security group for all nodes and give acess from anywhere to all node ports

nsabharwal · ‎02-09-2016

@Rupinder Singh You have to setup local /etc/hosts in your machine that will resolve those hostname to public IP of servers

rupinderkaoni · ‎02-09-2016

thnx sir @Neeraj Sabharwal ok you mean i have to write like this

172.31.1.137 ip-172-31-1-137.ap-northeast-1.compute.internal ambariserver

on my windows client in "C:\Windows\System32\drivers\etc\hosts"

nsabharwal · ‎02-09-2016

@Rupinder Singh Yes!!!

aervits · ‎02-08-2016

@Rupinder Singh I would suggest to run service checks on all components including pig and tez. Once you validate everything works, I would open a pig grunt shell and try running each pig statement one at a time and look for errors. Additionally, you can dump after each executed command in pig to see where you fail. It can be many things and really hard to tell from what you've told us so far.

rupinderkaoni · ‎02-09-2016

thnx @Artem Ervits for replying here is my script ;

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

dump a;

when i execute the first statement and then execute dump a; it stops at zero percent and never move forward;

Cloudera Community

Support Questions

pig script status running but always remain at zero percent ?