Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I need help with the riskfactor pig script from the HDP 2.5 tutorial.

avatar

Hello,

I am stepping through this part of the HDP 2.5 tutorial:

https://github.com/hortonworks/tutorials/blob/hdp-2.5/tutorials/hortonworks/hello-hdp-an-introductio...

I have executed this statement in the Hive view in Ambari under maria_dev:

CREATE TABLE riskfactor (driverid string,events bigint,totmiles bigint,riskfactor float) STORED AS ORC;

I have checked the table to be present in the default db and it is there.

After executing the following pig script:

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

b = filter a by event != 'normal';

c = foreach b generate driverid, event, (int) '1' as occurance;

d = group c by driverid;

e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;

g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();

h = join e by driverid, g by driverid;

final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;

store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

I get the following errors:

ls: cannot access /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory ls: cannot access /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory WARNING: Use "yarn jar" to launch YARN applications. 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : TEZ 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType 2016-09-27 11:51:21,605 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35 2016-09-27 11:51:21,605 [main] INFO org.apache.pig.Main - Logging error messages to: /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log 2016-09-27 11:51:23,260 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/yarn/.pigbootup not found 2016-09-27 11:51:23,453 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020 2016-09-27 11:51:24,818 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-script.pig-8ca435c7-920a-4f44-953e-454a42973ab8 2016-09-27 11:51:25,478 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-09-27 11:51:25,671 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook 2016-09-27 11:51:27,037 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:27,107 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:27,170 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:27,904 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:27,906 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:27,909 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:28,140 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 1 time(s). 2016-09-27 11:51:28,237 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:28,317 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:28,325 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:28,723 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: <file script.pig, line 9, column 0> Output Location Validation Failed for: 'riskfactor More info to follow: Pig 'double' type in column 2(0-based) cannot map to HCat 'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at logfile: /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 330 milliseconds (7330 ms)

When I executed the script for the very first time I did not see any errors, but the riskfactor table was still empty and should have been populated.

Is there somebody that can help?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Robbert Naastepad

It looks like there is a data type mismatch according to the error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
  
    Output Location Validation Failed for: 'riskfactor More info to 
follow: Pig 'double' type in column 2(0-based) cannot map to HCat 
'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at 
logfile: 
/hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log
 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script 
completed in 7 seconds and 330 milliseconds (7330 ms

The log indicates that it's attempting to store a DOUBLE into a target column that should be a BIGINT. It saying "in column 2(0-based)", so the problem is with totmiles.

View solution in original post

10 REPLIES 10

avatar
Super Guru

@Robbert Naastepad

It looks like there is a data type mismatch according to the error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
  
    Output Location Validation Failed for: 'riskfactor More info to 
follow: Pig 'double' type in column 2(0-based) cannot map to HCat 
'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at 
logfile: 
/hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log
 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script 
completed in 7 seconds and 330 milliseconds (7330 ms

The log indicates that it's attempting to store a DOUBLE into a target column that should be a BIGINT. It saying "in column 2(0-based)", so the problem is with totmiles.

avatar
Super Collaborator

Hi @Robbert Naastepad, as spotted by @Michael Young, you can try changing the data type of totmiles variable to double. Drop the table riskfactor from HIve and create it again with:

drop table riskfactor;

CREATE TABLE riskfactor (driverid string,events bigint,totmiles double,riskfactor float) STORED AS ORC;

Let us know if this works.

avatar

It worked for me on resolving the same error. Thank you @mrizvi and @Michael Young

avatar
@Michael Young and @mrizvi thanks for your answers. This works like a charm. I promise next time I will take a closer look at the errors. I am an Oracle guy used to get errors telling him exactly what went wrong and I have to get used to sifting through log-messages from the whole stack.

avatar
Contributor

@Robbert Naastepad

No worries!

avatar
Explorer

I was getting below error:

2016-10-25 05:19:47,348 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
<file script.pig, line 9, column 0> Output Location Validation Failed for: 'riskfactor More info to follow:
Pig 'long' type in column 2(0-based) cannot map to HCat 'DOUBLE'type.  Target filed must be of HCat type {BIGINT}
Details at logfile:

So I changed line# 8 by casting $3 as double. This worked just fine for me.

final_data = foreach h generate $0 as driverid, $1 as events, (double) $3 as totmiles, (float) $3/$1 as riskfactor;

avatar

Hi,

Does not work. I dropped table RISKFACTOR, created a new one with TOTMILES as Double...

Still the issue:

ls: cannot access /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory


WARNING: Use "yarn jar" to launch YARN applications.
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2016-11-29 15:48:25,281 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35
2016-11-29 15:48:25,281 [main] INFO  org.apache.pig.Main - Logging error messages to: /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/pig_1480434505279.log
2016-11-29 15:48:26,196 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/yarn/.pigbootup not found
2016-11-29 15:48:26,372 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
2016-11-29 15:48:27,388 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-script.pig-e1155078-b7bf-4f84-b9e7-b3f427858f9b
2016-11-29 15:48:27,787 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:27,910 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2016-11-29 15:48:28,659 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:28,709 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:28,771 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:29,451 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:29,453 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:29,455 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:29,586 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
2016-11-29 15:48:29,664 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:29,700 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:29,705 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:30,169 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:30,189 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:30,337 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:30,496 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,GROUP_BY,FILTER
2016-11-29 15:48:30,555 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-29 15:48:30,601 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-11-29 15:48:30,675 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 174587904 to monitor. collectionUsageThreshold = 122211528, usageThreshold = 122211528
2016-11-29 15:48:30,746 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for a: $0, $3, $4, $5, $6, $7, $8, $9
2016-11-29 15:48:30,889 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/yarn/staging and resources directory is /tmp/temp-293241078
2016-11-29 15:48:30,937 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2016-11-29 15:48:30,985 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2016-11-29 15:48:31,099 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:31,112 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:31,114 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:31,280 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,293 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - ORC pushdown predicate: null
2016-11-29 15:48:31,313 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcUtils - Using schema evolution configuration variables schema.evolution.columns [truckid, driverid, event, latitude, longitude, city, state, velocity, event_ind, idling_ind] / schema.evolution.columns.types [string, string, string, double, double, string, string, int, int, int] (isAcid false)
2016-11-29 15:48:31,775 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - FooterCacheHitRatio: 0/2
2016-11-29 15:48:31,775 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=OrcGetSplits start=1480434511280 end=1480434511775 duration=495 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,779 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-29 15:48:31,894 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:31,898 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:31,903 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - ORC pushdown predicate: null
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcUtils - Using schema evolution configuration variables schema.evolution.columns [driverid, totmiles] / schema.evolution.columns.types [string, double] (isAcid false)
2016-11-29 15:48:32,017 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - FooterCacheHitRatio: 0/2
2016-11-29 15:48:32,018 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=OrcGetSplits start=1480434511989 end=1480434512017 duration=28 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:32,018 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-metastore-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: jdo-api-3.0.1.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hcatalog-core-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hcatalog-pig-adapter-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.16.0.2.5.0.0-1245-core-h2.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: libfb303-0.9.3.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-exec-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hbase-handler-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: libthrift-0.9.3.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2016-11-29 15:48:33,520 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-52: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: a,b,c,d,e
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: a[1,4],b[2,4],c[3,4],e[5,4],d[4,4]
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: 
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Set auto parallelism for vertex scope-53
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-53: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: e,h
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: e[5,4],h[7,4]
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: GROUP_BY
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-54: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: g,h
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: g[6,4],h[7,4]
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: 
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Set auto parallelism for vertex scope-55
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-55: parallelism=2, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: final_data,h
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: h[7,4],final_data[8,13]
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: HASH_JOIN
2016-11-29 15:48:33,973 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:33,987 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:33,989 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:34,040 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Total estimated parallelism is 5
2016-11-29 15:48:34,121 [PigTezLauncher-0] INFO  org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Increasing tez.am.resource.memory.mb from 256 to 1024 as total estimated tasks = 5, total vertices = 4, max outputs = 1
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Increasing Tez AM Heap Size from 0M to 512M as total estimated tasks = 5, total vertices = 4, max outputs = 1
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Value of tez.am.launch.cmd-opts is now -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx512M
2016-11-29 15:48:34,153 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.5.0.0-1245, revision=c98dc048175afd3f56a44f05a1c18c6813f0b9a4, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-26T01:23:50Z ]
2016-11-29 15:48:34,367 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:34,382 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2016-11-29 15:48:34,531 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2016-11-29 15:48:34,541 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager to manage Timeline ACLs
2016-11-29 15:48:34,673 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:34,679 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Session mode. Starting session.
2016-11-29 15:48:34,682 [PigTezLauncher-0] INFO  org.apache.tez.common.security.TokenCache - Merging additional tokens from binary file, binaryFileName=/hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/container_tokens
2016-11-29 15:48:34,683 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.5.0.0-1245/tez/tez.tar.gz
2016-11-29 15:48:34,754 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Stage directory /tmp/yarn/staging doesn't exist and is created
2016-11-29 15:48:34,780 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/yarn/staging/.tez/application_1480244541051_0025 doesn't exist and is created
2016-11-29 15:48:34,821 [PigTezLauncher-0] INFO  org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1480244541051_0025
2016-11-29 15:48:34,975 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1480244541051_0025
2016-11-29 15:48:34,981 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1480244541051_0025/
2016-11-29 15:48:41,690 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:script.pig-0_scope-0
2016-11-29 15:48:41,690 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025, dagName=PigLatin:script.pig-0_scope-0, callerContext={ context=PIG, callerType=PIG_SCRIPT_ID, callerId=PIG-script.pig-e1155078-b7bf-4f84-b9e7-b3f427858f9b }
2016-11-29 15:48:42,244 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025, dagName=PigLatin:script.pig-0_scope-0
2016-11-29 15:48:42,512 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:42,513 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2016-11-29 15:48:42,514 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2016-11-29 15:48:42,531 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:script.pig-0_scope-0. Application id: application_1480244541051_0025
2016-11-29 15:48:43,083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1480244541051_0025
2016-11-29 15:48:43,537 [Timer-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 5 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2016-11-29 15:48:53,253 [PigTezLauncher-0] INFO  org.apache.tez.common.counters.Limits - Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=3000, COUNTER_NAME_MAX=64, MAX_COUNTERS=10000
2016-11-29 15:48:53,262 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=SUCCEEDED, progress=TotalTasks: 4 Succeeded: 4 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=Counters: 174
org.apache.tez.common.counters.DAGCounter
NUM_SUCCEEDED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
DATA_LOCAL_TASKS=2
AM_CPU_MILLISECONDS=3370
AM_GC_TIME_MILLIS=31
File System Counters
FILE_BYTES_READ=3482
FILE_BYTES_WRITTEN=2396
HDFS_BYTES_READ=26083
HDFS_BYTES_WRITTEN=1532
HDFS_READ_OPS=7
HDFS_WRITE_OPS=2
HDFS_OP_CREATE=1
HDFS_OP_GET_FILE_STATUS=3
HDFS_OP_OPEN=4
HDFS_OP_RENAME=1
org.apache.tez.common.counters.TaskCounter
REDUCE_INPUT_GROUPS=298
REDUCE_INPUT_RECORDS=298
COMBINE_INPUT_RECORDS=0
SPILLED_RECORDS=596
NUM_SHUFFLED_INPUTS=5
NUM_SKIPPED_INPUTS=0
NUM_FAILED_SHUFFLE_INPUTS=0
MERGED_MAP_OUTPUTS=5
GC_TIME_MILLIS=285
CPU_MILLISECONDS=5000
PHYSICAL_MEMORY_BYTES=823132160
VIRTUAL_MEMORY_BYTES=3589931008
COMMITTED_HEAP_BYTES=823132160
INPUT_RECORDS_PROCESSED=8100
INPUT_SPLIT_LENGTH_BYTES=52879
OUTPUT_RECORDS=755
OUTPUT_BYTES=7858
OUTPUT_BYTES_WITH_OVERHEAD=4665
OUTPUT_BYTES_PHYSICAL=2252
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILLS_BYTES_READ=2252
ADDITIONAL_SPILL_COUNT=0
SHUFFLE_CHUNK_COUNT=3
SHUFFLE_BYTES=2252
SHUFFLE_BYTES_DECOMPRESSED=4665
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_DISK_DIRECT=2252
NUM_MEM_TO_DISK_MERGES=0
NUM_DISK_TO_DISK_MERGES=0
SHUFFLE_PHASE_TIME=71
MERGE_PHASE_TIME=104
FIRST_EVENT_RECEIVED=36
LAST_EVENT_RECEIVED=40
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_53_INPUT_scope_52
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_55_INPUT_scope_53
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_55_INPUT_scope_54
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
TaskCounter_scope_52_INPUT_scope_2
INPUT_RECORDS_PROCESSED=8000
INPUT_SPLIT_LENGTH_BYTES=52215
TaskCounter_scope_52_OUTPUT_scope_53
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=4992
OUTPUT_BYTES_PHYSICAL=613
OUTPUT_BYTES_WITH_OVERHEAD=1377
OUTPUT_RECORDS=457
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=99
TaskCounter_scope_53_INPUT_scope_52
ADDITIONAL_SPILLS_BYTES_READ=613
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=21
LAST_EVENT_RECEIVED=21
MERGED_MAP_OUTPUTS=1
MERGE_PHASE_TIME=50
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=1
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=99
REDUCE_INPUT_RECORDS=99
SHUFFLE_BYTES=613
SHUFFLE_BYTES_DECOMPRESSED=1377
SHUFFLE_BYTES_DISK_DIRECT=613
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=31
SPILLED_RECORDS=99
TaskCounter_scope_53_OUTPUT_scope_55
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=1074
OUTPUT_BYTES_PHYSICAL=683
OUTPUT_BYTES_WITH_OVERHEAD=1284
OUTPUT_RECORDS=99
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=99
TaskCounter_scope_54_INPUT_scope_24
INPUT_RECORDS_PROCESSED=100
INPUT_SPLIT_LENGTH_BYTES=664
TaskCounter_scope_54_OUTPUT_scope_55
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=1792
OUTPUT_BYTES_PHYSICAL=956
OUTPUT_BYTES_WITH_OVERHEAD=2004
OUTPUT_RECORDS=100
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=100
TaskCounter_scope_55_INPUT_scope_53
ADDITIONAL_SPILLS_BYTES_READ=683
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=7
LAST_EVENT_RECEIVED=10
MERGED_MAP_OUTPUTS=2
MERGE_PHASE_TIME=24
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=2
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=99
REDUCE_INPUT_RECORDS=99
SHUFFLE_BYTES=683
SHUFFLE_BYTES_DECOMPRESSED=1284
SHUFFLE_BYTES_DISK_DIRECT=683
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=19
SPILLED_RECORDS=99
TaskCounter_scope_55_INPUT_scope_54
ADDITIONAL_SPILLS_BYTES_READ=956
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=8
LAST_EVENT_RECEIVED=9
MERGED_MAP_OUTPUTS=2
MERGE_PHASE_TIME=30
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=2
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=100
REDUCE_INPUT_RECORDS=100
SHUFFLE_BYTES=956
SHUFFLE_BYTES_DECOMPRESSED=2004
SHUFFLE_BYTES_DISK_DIRECT=956
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=21
SPILLED_RECORDS=100
TaskCounter_scope_55_OUTPUT_scope_51
OUTPUT_RECORDS=99
org.apache.hadoop.mapreduce.TaskCounter
COMBINE_INPUT_RECORDS=99
COMBINE_OUTPUT_RECORDS=457
org.apache.hadoop.mapreduce.TaskCounter_scope_52_OUTPUT_scope_53
COMBINE_INPUT_RECORDS=99
COMBINE_OUTPUT_RECORDS=457
org.apache.hadoop.mapreduce.TaskCounter_scope_53_INPUT_scope_52
COMBINE_INPUT_RECORDS=0
COMBINE_OUTPUT_RECORDS=0
2016-11-29 15:48:54,128 [main] INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
       HadoopVersion: 2.7.3.2.5.0.0-1245                                                                                  
          PigVersion: 0.16.0.2.5.0.0-1245                                                                                
          TezVersion: 0.7.0.2.5.0.0-1245                                                                                  
              UserId: yarn                                                                                                
            FileName: script.pig                                                                                          
           StartedAt: 2016-11-29 15:48:31                                                                                
          FinishedAt: 2016-11-29 15:48:54                                                                                
            Features: HASH_JOIN,GROUP_BY,FILTER                                                                          
Success!
DAG 0:
                                    Name: PigLatin:script.pig-0_scope-0                                                                      
                           ApplicationId: job_1480244541051_0025                                                                              
                      TotalLaunchedTasks: 4                                                                                                  
                           FileBytesRead: 3482                                                                                                
                        FileBytesWritten: 2396                                                                                                
                           HdfsBytesRead: 26083                                                                                              
                        HdfsBytesWritten: 1532                                                                                                
      SpillableMemoryManager spill count: 0                                                                                                  
                Bags proactively spilled: 0                                                                                                  
             Records proactively spilled: 0                                                                                                  
DAG Plan:
Tez vertex scope-52->Tez vertex scope-53,
Tez vertex scope-53->Tez vertex scope-55,
Tez vertex scope-54->Tez vertex scope-55,
Tez vertex scope-55
Vertex Stats:
VertexId Parallelism TotalTasks   InputRecords   ReduceInputRecords  OutputRecords  FileBytesRead FileBytesWritten  HdfsBytesRead HdfsBytesWritten AliasFeatureOutputs
scope-52           1          1           8000                    0            457             32              645          24641                0 a,b,c,d,e
scope-53           1          1              0                   99             99            701              739              0                0 e,hGROUP_BY
scope-54           1          1            100                    0            100             56             1012           1442                0 g,h
scope-55           2          1              0                  199             99           2693                0              0             1532 final_data,hHASH_JOINriskfactor,
Input(s):
Successfully read 100 records (1442 bytes) from: "drivermileage"
Successfully read 8000 records (24641 bytes) from: "geolocation"
Output(s):
Successfully stored 99 records (1532 bytes) in: "riskfactor"
2016-11-29 15:48:54,163 [main] INFO  org.apache.pig.Main - Pig script completed in 29 seconds and 51 milliseconds (29051 ms)
2016-11-29 15:48:54,163 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool
2016-11-29 15:48:54,183 [pool-1-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Shutting down Tez session org.apache.tez.client.TezClient@df073ec
2016-11-29 15:48:54,206 [pool-1-thread-1] INFO  org.apache.tez.client.TezClient - Shutting down Tez Session, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025

Any idea? Thanks

avatar
Super Collaborator

@Xavier VAN AUSLOOS, where is the issue? I cannot see any errors. Moreover, if you scroll down, you will see this:

Input(s):
Successfully read 100 records (1442 bytes) from: "drivermileage"
Successfully read 8000 records (24641 bytes) from: "geolocation"
Output(s):
Successfully stored 99 records (1532 bytes) in: "riskfactor"
2016-11-29 15:48:54,163 [main] INFO org.apache.pig.Main - Pig script completed in 29 seconds and 51 milliseconds (29051 ms)

Check your riskfactor table, you should see your data.

avatar

Thanks..it worked finally. I do not know why and how...anyway 🙂