Created on 02-08-2016 10:06 AM - edited 08-19-2019 02:32 AM
s i am installed hadoop using hdp on ec2 amazon cloud.
architecture is :
1 node on which ambari server is installed
2 master nodes
3 data nodes
So i have total 6 machines in the cluster.
Now when i open the pig view in ambari and run my hello world tutorial script
a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader(); b = filter a by event != 'normal'; c = foreach b generate driverid, event, (int) '1' as occurance; d = group c by driverid; e = foreach d generate group as driverid, SUM(c.occurance) as t_occ; g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader(); h = join e by driverid, g by driverid; final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor; store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();
Its status changes to accepted and then changes to running but always remain running for hours
Then i try to run this script in grunt shell . Here i create new file with the name "vi riskfactor.pig" and run it using the command " pig -useHCatalog -f riskfactor.pig". This job is submitted but never moves from zero percent . Here i attach the screen shots of my console.
here i upload the screenshot of installed components in my cluster
Created 02-11-2016 05:04 AM
thanks @Neeraj Sabharwal, @Artem Ervits, @Geoffrey Shelton Okot and @Benjamin Leonhardi for your valuable replies my problem is solved with the help of your answers. Thank You 🙂 🙂
Created 02-08-2016 11:21 AM
@Rupinder Singh
1.Restarting YARN should solve it.
2. This could be a java heap size problem.
or
To use both the options, use below command:
pig -useHCatalog -x tez -f script.pig
To use only Hcat:
pig -useHCatalog -f script.pig
To use only tez:
pig -x tez -f script.pig
Created 02-08-2016 12:24 PM
Thanx @Geoffrey Shelton Okot for replying . I restarted the yarn but problem still persists. if it is java heap size problem then how to solve it .
thnx in advance 🙂
Created 02-08-2016 12:30 PM
Could be a lot of things: Anything from yarn queue misconfigured to job just takes a long time.
Have a look into the resourcemanager UI (:8088) and get the logs for the job that is being kicked off. You can see if the job is kicked off if containers have been allocated and are running and if he is reading/writing data. ( Don't use tez in the beginning I would figure out first what is wrong with the job or yarn. )
So
-> go to Resourcemanager UI
-> Click on the job that is kicked off ( you can see the job name in pig or just look for one running and pig )
-> Click on Application Master
-> You should see all containers started for it. IF there is none you might have a yarn queue problem
-> If mappers are started click on Maps ( lower right ) and check if they are running and perhaps in the logs to see what is going on.
Created on 02-09-2016 05:52 AM - edited 08-19-2019 02:31 AM
sir @Benjamin Leonhardi i found one more issue as i am unable to acess resource manager UI from my ambari
When i click on the resource manager ui of my ambari this window opens and this message is shown.
Actually my ambari server is installed on 1 different node.
i have 2 master nodes
1 node data +master
2 data nodes
total =6 nodes
master node 1 have :-
Master node has.....
third node has :......
Other 2 nodes has only clients installed on it.
i also set security group for all nodes and give acess from anywhere to all node ports
Created 02-09-2016 11:10 AM
@Rupinder Singh You have to setup local /etc/hosts in your machine that will resolve those hostname to public IP of servers
Created 02-09-2016 01:01 PM
thnx sir @Neeraj Sabharwal ok you mean i have to write like this
172.31.1.137 ip-172-31-1-137.ap-northeast-1.compute.internal ambariserver
on my windows client in "C:\Windows\System32\drivers\etc\hosts"
Created 02-09-2016 01:03 PM
@Rupinder Singh Yes!!!
Created 02-08-2016 08:21 PM
@Rupinder Singh I would suggest to run service checks on all components including pig and tez. Once you validate everything works, I would open a pig grunt shell and try running each pig statement one at a time and look for errors. Additionally, you can dump after each executed command in pig to see where you fail. It can be many things and really hard to tell from what you've told us so far.
Created 02-09-2016 06:32 AM
thnx @Artem Ervits for replying here is my script ;
a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();
dump a;
when i execute the first statement and then execute dump a; it stops at zero percent and never move forward;