Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problems with Pig View and Tez execution

Problems with Pig View and Tez execution

Expert Contributor

I have problems running non trivial Pig Scripts from the Pig View with Tez execution enabled.

The script I'm testing is the next one based on the the Hadoop Tutorial with minor modifications:

geo = LOAD 'geolocation' USING org.apache.hive.hcatalog.pig.HCatLoader();
abn = FILTER geo BY event != 'normal';
dg = GROUP abn BY driverid;
de = FOREACH dg GENERATE group AS driverid, COUNT(abn) AS events;
dm = LOAD 'driver_mileage' using org.apache.hive.hcatalog.pig.HCatLoader();
dem = join de by driverid, dm by driverid;
risk = foreach dem generate $0 as driverid, $1 as events, (long)$3 as totmiles, 
  (float)$3/$1 as riskfactor;
r = LIMIT risk 10;
dump r;

Of course I'm using -useHCatalog in Pig View, the referenced tables exists in Hive's default database and I've checked that exactly the same script works like a charm from Linux console on grunt ("pig -useHCatalog -x tez") and from the same Pig View using MapReduce, although in this case it takes a lot of time.

With Tez enabled the job completes but ends with errors in the logs and the results are empty. Here is an screenshot showing the two jobs competed without and with Tez (see the huge difference in duration).

10272-pig-view-jobs-complete.png

The lines that seems more relevant in the logs (attached here

job-1481505913714-0020-logs-1.txt) are as the following ones:

ls: cannot access /grid/0/hadoop/yarn/local/usercache/admin/..../hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /grid/0/hadoop/yarn/local/usercache/admin/.../hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory
WARNING: Use "yarn jar" to launch YARN applications.
16/12/13 21:13:01 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/12/13 21:13:01 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/12/13 21:13:01 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
16/12/13 21:13:01 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
16/12/13 21:13:01 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
...
2016-12-13 21:13:02,934 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2016-12-13 21:13:03,580 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
...
2016-12-13 21:13:06,411 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Exception while waiting for Tez client to be ready
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
....
2016-12-13 21:13:06,414 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Cannot submit DAG
java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration
...
2016-12-13 21:13:07,179 [main] WARN  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Ooops! Some job has failed...


12 REPLIES 12

Re: Problems with Pig View and Tez execution

Expert Contributor

One note: the same warn about not being able to find "slf4j-api-*.jar" and "*hbase-storage-handler-*.jar" under /hive/lib/ is raised when launching grunt, but everything works ok from there on.

Re: Problems with Pig View and Tez execution

Expert Contributor

I have compared the logs from the script when run from Ambari Pig View (fails) and from pig in command line (grunt). The divergence point seems to be here:

Running Pig+Tez from grunt:

2016-12-13 22:16:32,084 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Session mode. Starting session.
2016-12-13 22:16:32,088 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.5.0.0-1245/tez/tez.tar.gz
...
2016-12-13 22:16:32,281 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1481505913714_0023...

Running Pig+Tez on Ambari Pig View:

2016-12-13 21:13:06,409 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Session mode. Starting session.
2016-12-13 21:13:06,411 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Exception while waiting for Tez client to be ready
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

From this, It seems the problem is related to the tez.lib.uris not being defined or loaded from the Pig View inside Ambary (but to be OK on the Linux environment). How may I fix this?

Re: Problems with Pig View and Tez execution

Expert Contributor

I have also checked this:

https://community.hortonworks.com/questions/11525/tezsessionmanager-exception-while-waiting-for-tez....

But it didn't help: I have the right settings in Ambari Tez Configs and the tez.tar.gz is there both in hdfs and Linux filesystem but the same error regarding "Invalid configuration of tez jars" keeps showing.

Re: Problems with Pig View and Tez execution

Expert Contributor

Is tez-site.xml present on the machine running the Pig-View?

Re: Problems with Pig View and Tez execution

Expert Contributor

Hi @bikas, thank your for your reply.

Yes, there are many copies of tez-site.xml present on the machine. Here the locate output on the server's shell:

[root@hdpmanager ~]# locate tez-site.xml
/etc/oozie/2.5.0.0-1245/0/action-conf/hive/tez-site.xml
/etc/tez/2.5.0.0-1245/0/tez-site.xml
/var/lib/ambari-server/resources/common-services/TEZ/0.4.0.2.1/configuration/tez-site.xml
/var/lib/ambari-server/resources/stacks/HDP/2.1.GlusterFS/services/TEZ/configuration/tez-site.xml
...
/var/lib/ambari-server/resources/stacks/HDP/2.5/services/TEZ/configuration/tez-site.xml

Also the Tez View and Hive View (using Tez) are working perfecty, so I don't guess this may be the problem. It's possible that there is any Tez related configuration file used only by Pig View??

Best regards

Re: Problems with Pig View and Tez execution

Expert Contributor

@bikas Based in your reply I've found something that may be relevant. The tez-site.xml files under /etc/tez and /var/lib/ambar-server are differente, and particularly the values of the property tez.lib.uris differ.

The one in the system Tez config /etc/tez/../tez-site.xml is:

<name>tez.lib.uris</name>
  <value>/hdp/apps/${hdp.version}/tez/tez.tar.gz</value>

This is ok and the referenced file exists as stated before. But the property in the Ambari resources config file /var/lib/ambari-server/resources/../tez-site.xml is like this:

<name>tez.lib.uris</name>
  <value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>

I've checked and none of these paths exist in the HDFS filesystem!!

May be this setup was wrong during the Ambari install?

I'm not sure if I'have to put the tez.tar.gz or its content inside these folders buy I will try this and see what happens.

Re: Problems with Pig View and Tez execution

Expert Contributor

No luck, I extracted and moved the jars inside tez.taz.gz to hdfs:///apps/tez/ but the same error occurs in Pig view with Tez enabled:

2016-12-15 15:35:11,581 [PigTezLauncher-0] ERROR org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Exception while waiting for Tez client to be ready
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configuration

Re: Problems with Pig View and Tez execution

Expert Contributor

The error log is misleading but It doesn't seems to be a problem with tez.lib.uris parameter settings in tez-site.xml.

I have made new testings with a minimal script loading a table from CVS usint PigStorage (so not using HCatalog) and it works. This is the script I've tested and works OK with Tez:

truck_events = LOAD '/tmp/truck_event_text_partition.csv'
 USING PigStorage(',')
 AS (driverId:int, truckId:int, eventTime:chararray,
  eventType:chararray, longitude:double, latitude:double,
  eventKey:chararray, correlationId:long, driverName:chararray,
  routeId:long,routeName:chararray,eventDate:chararray);
DESCRIBE truck_events;

and here the logs, showing no errors:

WARNING: Use "yarn jar" to launch YARN applications.
16/12/15 18:58:54 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2016-12-15 18:58:54,562 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35
2016-12-15 18:58:54,563 [main] INFO  org.apache.pig.Main - Logging error messages to: ****
2016-12-15 18:58:55,104 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/yarn/.pigbootup not found
2016-12-15 18:58:55,176 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://****:8020
2016-12-15 18:58:55,547 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-script.pig-1ff81a55-a051-4bd6-a098-51e3d7f37e1a
2016-12-15 18:58:55,759 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://****:8188/ws/v1/timeline/
2016-12-15 18:58:55,849 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2016-12-15 18:58:56,335 [main] INFO  org.apache.pig.Main - Pig script completed in 1 second and 907 milliseconds (1907 ms)

I'm really missed with this...

Re: Problems with Pig View and Tez execution

Expert Contributor

Does the HADOOP_CLASSPATH environment var on that machine running the Pig View have the correct tez-site.xml or its directory?

Don't have an account?
Coming from Hortonworks? Activate your account here