Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

I need help with the riskfactor pig script from the HDP 2.5 tutorial.

avatar

Hello,

I am stepping through this part of the HDP 2.5 tutorial:

https://github.com/hortonworks/tutorials/blob/hdp-2.5/tutorials/hortonworks/hello-hdp-an-introductio...

I have executed this statement in the Hive view in Ambari under maria_dev:

CREATE TABLE riskfactor (driverid string,events bigint,totmiles bigint,riskfactor float) STORED AS ORC;

I have checked the table to be present in the default db and it is there.

After executing the following pig script:

a = LOAD 'geolocation' using org.apache.hive.hcatalog.pig.HCatLoader();

b = filter a by event != 'normal';

c = foreach b generate driverid, event, (int) '1' as occurance;

d = group c by driverid;

e = foreach d generate group as driverid, SUM(c.occurance) as t_occ;

g = LOAD 'drivermileage' using org.apache.hive.hcatalog.pig.HCatLoader();

h = join e by driverid, g by driverid;

final_data = foreach h generate $0 as driverid, $1 as events, $3 as totmiles, (float) $3/$1 as riskfactor;

store final_data into 'riskfactor' using org.apache.hive.hcatalog.pig.HCatStorer();

I get the following errors:

ls: cannot access /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory ls: cannot access /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory WARNING: Use "yarn jar" to launch YARN applications. 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Trying ExecType : TEZ 16/09/27 11:51:21 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType 2016-09-27 11:51:21,605 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35 2016-09-27 11:51:21,605 [main] INFO org.apache.pig.Main - Logging error messages to: /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log 2016-09-27 11:51:23,260 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/yarn/.pigbootup not found 2016-09-27 11:51:23,453 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020 2016-09-27 11:51:24,818 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-script.pig-8ca435c7-920a-4f44-953e-454a42973ab8 2016-09-27 11:51:25,478 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-09-27 11:51:25,671 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook 2016-09-27 11:51:27,037 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:27,107 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:27,170 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:27,904 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:27,906 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:27,909 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:28,140 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 1 time(s). 2016-09-27 11:51:28,237 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist 2016-09-27 11:51:28,317 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-09-27 11:51:28,325 [main] INFO hive.metastore - Connected to metastore. 2016-09-27 11:51:28,723 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: <file script.pig, line 9, column 0> Output Location Validation Failed for: 'riskfactor More info to follow: Pig 'double' type in column 2(0-based) cannot map to HCat 'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at logfile: /hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 330 milliseconds (7330 ms)

When I executed the script for the very first time I did not see any errors, but the riskfactor table was still empty and should have been populated.

Is there somebody that can help?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Robbert Naastepad

It looks like there is a data type mismatch according to the error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
  
    Output Location Validation Failed for: 'riskfactor More info to 
follow: Pig 'double' type in column 2(0-based) cannot map to HCat 
'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at 
logfile: 
/hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log
 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script 
completed in 7 seconds and 330 milliseconds (7330 ms

The log indicates that it's attempting to store a DOUBLE into a target column that should be a BIGINT. It saying "in column 2(0-based)", so the problem is with totmiles.

View solution in original post

10 REPLIES 10

avatar
Super Guru

@Robbert Naastepad

It looks like there is a data type mismatch according to the error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
  
    Output Location Validation Failed for: 'riskfactor More info to 
follow: Pig 'double' type in column 2(0-based) cannot map to HCat 
'BIGINT'type. Target filed must be of HCat type {DOUBLE} Details at 
logfile: 
/hadoop/yarn/local/usercache/maria_dev/appcache/application_1474973150203_0003/container_1474973150203_0003_01_000002/pig_1474977081603.log
 2016-09-27 11:51:28,746 [main] INFO org.apache.pig.Main - Pig script 
completed in 7 seconds and 330 milliseconds (7330 ms

The log indicates that it's attempting to store a DOUBLE into a target column that should be a BIGINT. It saying "in column 2(0-based)", so the problem is with totmiles.

avatar
Super Collaborator

Hi @Robbert Naastepad, as spotted by @Michael Young, you can try changing the data type of totmiles variable to double. Drop the table riskfactor from HIve and create it again with:

drop table riskfactor;

CREATE TABLE riskfactor (driverid string,events bigint,totmiles double,riskfactor float) STORED AS ORC;

Let us know if this works.

avatar
New Member

It worked for me on resolving the same error. Thank you @mrizvi and @Michael Young

avatar
@Michael Young and @mrizvi thanks for your answers. This works like a charm. I promise next time I will take a closer look at the errors. I am an Oracle guy used to get errors telling him exactly what went wrong and I have to get used to sifting through log-messages from the whole stack.

avatar
Visitor

@Robbert Naastepad

No worries!

avatar
New Member

I was getting below error:

2016-10-25 05:19:47,348 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 
<file script.pig, line 9, column 0> Output Location Validation Failed for: 'riskfactor More info to follow:
Pig 'long' type in column 2(0-based) cannot map to HCat 'DOUBLE'type.  Target filed must be of HCat type {BIGINT}
Details at logfile:

So I changed line# 8 by casting $3 as double. This worked just fine for me.

final_data = foreach h generate $0 as driverid, $1 as events, (double) $3 as totmiles, (float) $3/$1 as riskfactor;

avatar

Hi,

Does not work. I dropped table RISKFACTOR, created a new one with TOTMILES as Double...

Still the issue:

ls: cannot access /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/hive.tar.gz/hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/hive.tar.gz/hive/hcatalog/lib/*hbase-storage-handler-*.jar: No such file or directory


WARNING: Use "yarn jar" to launch YARN applications.
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
16/11/29 15:48:25 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2016-11-29 15:48:25,281 [main] INFO  org.apache.pig.Main - Apache Pig version 0.16.0.2.5.0.0-1245 (rexported) compiled Aug 26 2016, 02:07:35
2016-11-29 15:48:25,281 [main] INFO  org.apache.pig.Main - Logging error messages to: /hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/pig_1480434505279.log
2016-11-29 15:48:26,196 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/yarn/.pigbootup not found
2016-11-29 15:48:26,372 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020
2016-11-29 15:48:27,388 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-script.pig-e1155078-b7bf-4f84-b9e7-b3f427858f9b
2016-11-29 15:48:27,787 [main] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:27,910 [main] INFO  org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2016-11-29 15:48:28,659 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:28,709 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:28,771 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:29,451 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:29,453 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:29,455 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:29,586 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 1 time(s).
2016-11-29 15:48:29,664 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:29,700 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:29,705 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:30,169 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:30,189 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:30,337 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:30,496 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,GROUP_BY,FILTER
2016-11-29 15:48:30,555 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-11-29 15:48:30,601 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-11-29 15:48:30,675 [main] INFO  org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (PS Old Gen) of size 174587904 to monitor. collectionUsageThreshold = 122211528, usageThreshold = 122211528
2016-11-29 15:48:30,746 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for a: $0, $3, $4, $5, $6, $7, $8, $9
2016-11-29 15:48:30,889 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/yarn/staging and resources directory is /tmp/temp-293241078
2016-11-29 15:48:30,937 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2016-11-29 15:48:30,985 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2016-11-29 15:48:31,099 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:31,112 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:31,114 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:31,280 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,293 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - ORC pushdown predicate: null
2016-11-29 15:48:31,313 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcUtils - Using schema evolution configuration variables schema.evolution.columns [truckid, driverid, event, latitude, longitude, city, state, velocity, event_ind, idling_ind] / schema.evolution.columns.types [string, string, string, double, double, string, string, int, int, int] (isAcid false)
2016-11-29 15:48:31,775 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - FooterCacheHitRatio: 0/2
2016-11-29 15:48:31,775 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=OrcGetSplits start=1480434511280 end=1480434511775 duration=495 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,779 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-29 15:48:31,894 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:31,898 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:31,903 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - ORC pushdown predicate: null
2016-11-29 15:48:31,989 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcUtils - Using schema evolution configuration variables schema.evolution.columns [driverid, totmiles] / schema.evolution.columns.types [string, double] (isAcid false)
2016-11-29 15:48:32,017 [main] INFO  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat - FooterCacheHitRatio: 0/2
2016-11-29 15:48:32,018 [main] INFO  org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=OrcGetSplits start=1480434511989 end=1480434512017 duration=28 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-11-29 15:48:32,018 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-metastore-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: jdo-api-3.0.1.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hcatalog-core-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hcatalog-pig-adapter-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.16.0.2.5.0.0-1245-core-h2.jar
2016-11-29 15:48:33,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: libfb303-0.9.3.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-exec-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hive-hbase-handler-1.2.1000.2.5.0.0-1245.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: libthrift-0.9.3.jar
2016-11-29 15:48:33,078 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2016-11-29 15:48:33,520 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-52: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: a,b,c,d,e
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: a[1,4],b[2,4],c[3,4],e[5,4],d[4,4]
2016-11-29 15:48:33,521 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: 
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Set auto parallelism for vertex scope-53
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-53: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: e,h
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: e[5,4],h[7,4]
2016-11-29 15:48:33,628 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: GROUP_BY
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-54: parallelism=1, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: g,h
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: g[6,4],h[7,4]
2016-11-29 15:48:33,791 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: 
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Set auto parallelism for vertex scope-55
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - For vertex - scope-55: parallelism=2, memory=256, java opts=-Xmx256m
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Processing aliases: final_data,h
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Detailed locations: h[7,4],final_data[8,13]
2016-11-29 15:48:33,863 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezDagBuilder - Pig features in the vertex: HASH_JOIN
2016-11-29 15:48:33,973 [main] WARN  org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.local does not exist
2016-11-29 15:48:33,987 [main] INFO  hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
2016-11-29 15:48:33,989 [main] INFO  hive.metastore - Connected to metastore.
2016-11-29 15:48:34,040 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Total estimated parallelism is 5
2016-11-29 15:48:34,121 [PigTezLauncher-0] INFO  org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Increasing tez.am.resource.memory.mb from 256 to 1024 as total estimated tasks = 5, total vertices = 4, max outputs = 1
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Increasing Tez AM Heap Size from 0M to 512M as total estimated tasks = 5, total vertices = 4, max outputs = 1
2016-11-29 15:48:34,122 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Value of tez.am.launch.cmd-opts is now -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx512M
2016-11-29 15:48:34,153 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.5.0.0-1245, revision=c98dc048175afd3f56a44f05a1c18c6813f0b9a4, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2016-08-26T01:23:50Z ]
2016-11-29 15:48:34,367 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:34,382 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2016-11-29 15:48:34,531 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2016-11-29 15:48:34,541 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager to manage Timeline ACLs
2016-11-29 15:48:34,673 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:34,679 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Session mode. Starting session.
2016-11-29 15:48:34,682 [PigTezLauncher-0] INFO  org.apache.tez.common.security.TokenCache - Merging additional tokens from binary file, binaryFileName=/hadoop/yarn/local/usercache/admin/appcache/application_1480244541051_0024/container_1480244541051_0024_01_000002/container_tokens
2016-11-29 15:48:34,683 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.5.0.0-1245/tez/tez.tar.gz
2016-11-29 15:48:34,754 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Stage directory /tmp/yarn/staging doesn't exist and is created
2016-11-29 15:48:34,780 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/yarn/staging/.tez/application_1480244541051_0025 doesn't exist and is created
2016-11-29 15:48:34,821 [PigTezLauncher-0] INFO  org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1480244541051_0025
2016-11-29 15:48:34,975 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1480244541051_0025
2016-11-29 15:48:34,981 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1480244541051_0025/
2016-11-29 15:48:41,690 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:script.pig-0_scope-0
2016-11-29 15:48:41,690 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025, dagName=PigLatin:script.pig-0_scope-0, callerContext={ context=PIG, callerType=PIG_SCRIPT_ID, callerId=PIG-script.pig-e1155078-b7bf-4f84-b9e7-b3f427858f9b }
2016-11-29 15:48:42,244 [PigTezLauncher-0] INFO  org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025, dagName=PigLatin:script.pig-0_scope-0
2016-11-29 15:48:42,512 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-11-29 15:48:42,513 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/172.17.0.2:8050
2016-11-29 15:48:42,514 [PigTezLauncher-0] INFO  org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at sandbox.hortonworks.com/172.17.0.2:10200
2016-11-29 15:48:42,531 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:script.pig-0_scope-0. Application id: application_1480244541051_0025
2016-11-29 15:48:43,083 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1480244541051_0025
2016-11-29 15:48:43,537 [Timer-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 5 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2016-11-29 15:48:53,253 [PigTezLauncher-0] INFO  org.apache.tez.common.counters.Limits - Counter limits initialized with parameters:  GROUP_NAME_MAX=256, MAX_GROUPS=3000, COUNTER_NAME_MAX=64, MAX_COUNTERS=10000
2016-11-29 15:48:53,262 [PigTezLauncher-0] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=SUCCEEDED, progress=TotalTasks: 4 Succeeded: 4 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=Counters: 174
org.apache.tez.common.counters.DAGCounter
NUM_SUCCEEDED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
DATA_LOCAL_TASKS=2
AM_CPU_MILLISECONDS=3370
AM_GC_TIME_MILLIS=31
File System Counters
FILE_BYTES_READ=3482
FILE_BYTES_WRITTEN=2396
HDFS_BYTES_READ=26083
HDFS_BYTES_WRITTEN=1532
HDFS_READ_OPS=7
HDFS_WRITE_OPS=2
HDFS_OP_CREATE=1
HDFS_OP_GET_FILE_STATUS=3
HDFS_OP_OPEN=4
HDFS_OP_RENAME=1
org.apache.tez.common.counters.TaskCounter
REDUCE_INPUT_GROUPS=298
REDUCE_INPUT_RECORDS=298
COMBINE_INPUT_RECORDS=0
SPILLED_RECORDS=596
NUM_SHUFFLED_INPUTS=5
NUM_SKIPPED_INPUTS=0
NUM_FAILED_SHUFFLE_INPUTS=0
MERGED_MAP_OUTPUTS=5
GC_TIME_MILLIS=285
CPU_MILLISECONDS=5000
PHYSICAL_MEMORY_BYTES=823132160
VIRTUAL_MEMORY_BYTES=3589931008
COMMITTED_HEAP_BYTES=823132160
INPUT_RECORDS_PROCESSED=8100
INPUT_SPLIT_LENGTH_BYTES=52879
OUTPUT_RECORDS=755
OUTPUT_BYTES=7858
OUTPUT_BYTES_WITH_OVERHEAD=4665
OUTPUT_BYTES_PHYSICAL=2252
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILLS_BYTES_READ=2252
ADDITIONAL_SPILL_COUNT=0
SHUFFLE_CHUNK_COUNT=3
SHUFFLE_BYTES=2252
SHUFFLE_BYTES_DECOMPRESSED=4665
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_DISK_DIRECT=2252
NUM_MEM_TO_DISK_MERGES=0
NUM_DISK_TO_DISK_MERGES=0
SHUFFLE_PHASE_TIME=71
MERGE_PHASE_TIME=104
FIRST_EVENT_RECEIVED=36
LAST_EVENT_RECEIVED=40
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_53_INPUT_scope_52
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_55_INPUT_scope_53
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Shuffle Errors_scope_55_INPUT_scope_54
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
TaskCounter_scope_52_INPUT_scope_2
INPUT_RECORDS_PROCESSED=8000
INPUT_SPLIT_LENGTH_BYTES=52215
TaskCounter_scope_52_OUTPUT_scope_53
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=4992
OUTPUT_BYTES_PHYSICAL=613
OUTPUT_BYTES_WITH_OVERHEAD=1377
OUTPUT_RECORDS=457
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=99
TaskCounter_scope_53_INPUT_scope_52
ADDITIONAL_SPILLS_BYTES_READ=613
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=21
LAST_EVENT_RECEIVED=21
MERGED_MAP_OUTPUTS=1
MERGE_PHASE_TIME=50
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=1
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=99
REDUCE_INPUT_RECORDS=99
SHUFFLE_BYTES=613
SHUFFLE_BYTES_DECOMPRESSED=1377
SHUFFLE_BYTES_DISK_DIRECT=613
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=31
SPILLED_RECORDS=99
TaskCounter_scope_53_OUTPUT_scope_55
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=1074
OUTPUT_BYTES_PHYSICAL=683
OUTPUT_BYTES_WITH_OVERHEAD=1284
OUTPUT_RECORDS=99
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=99
TaskCounter_scope_54_INPUT_scope_24
INPUT_RECORDS_PROCESSED=100
INPUT_SPLIT_LENGTH_BYTES=664
TaskCounter_scope_54_OUTPUT_scope_55
ADDITIONAL_SPILLS_BYTES_READ=0
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
ADDITIONAL_SPILL_COUNT=0
OUTPUT_BYTES=1792
OUTPUT_BYTES_PHYSICAL=956
OUTPUT_BYTES_WITH_OVERHEAD=2004
OUTPUT_RECORDS=100
SHUFFLE_CHUNK_COUNT=1
SPILLED_RECORDS=100
TaskCounter_scope_55_INPUT_scope_53
ADDITIONAL_SPILLS_BYTES_READ=683
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=7
LAST_EVENT_RECEIVED=10
MERGED_MAP_OUTPUTS=2
MERGE_PHASE_TIME=24
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=2
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=99
REDUCE_INPUT_RECORDS=99
SHUFFLE_BYTES=683
SHUFFLE_BYTES_DECOMPRESSED=1284
SHUFFLE_BYTES_DISK_DIRECT=683
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=19
SPILLED_RECORDS=99
TaskCounter_scope_55_INPUT_scope_54
ADDITIONAL_SPILLS_BYTES_READ=956
ADDITIONAL_SPILLS_BYTES_WRITTEN=0
COMBINE_INPUT_RECORDS=0
FIRST_EVENT_RECEIVED=8
LAST_EVENT_RECEIVED=9
MERGED_MAP_OUTPUTS=2
MERGE_PHASE_TIME=30
NUM_DISK_TO_DISK_MERGES=0
NUM_FAILED_SHUFFLE_INPUTS=0
NUM_MEM_TO_DISK_MERGES=0
NUM_SHUFFLED_INPUTS=2
NUM_SKIPPED_INPUTS=0
REDUCE_INPUT_GROUPS=100
REDUCE_INPUT_RECORDS=100
SHUFFLE_BYTES=956
SHUFFLE_BYTES_DECOMPRESSED=2004
SHUFFLE_BYTES_DISK_DIRECT=956
SHUFFLE_BYTES_TO_DISK=0
SHUFFLE_BYTES_TO_MEM=0
SHUFFLE_PHASE_TIME=21
SPILLED_RECORDS=100
TaskCounter_scope_55_OUTPUT_scope_51
OUTPUT_RECORDS=99
org.apache.hadoop.mapreduce.TaskCounter
COMBINE_INPUT_RECORDS=99
COMBINE_OUTPUT_RECORDS=457
org.apache.hadoop.mapreduce.TaskCounter_scope_52_OUTPUT_scope_53
COMBINE_INPUT_RECORDS=99
COMBINE_OUTPUT_RECORDS=457
org.apache.hadoop.mapreduce.TaskCounter_scope_53_INPUT_scope_52
COMBINE_INPUT_RECORDS=0
COMBINE_OUTPUT_RECORDS=0
2016-11-29 15:48:54,128 [main] INFO  org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
       HadoopVersion: 2.7.3.2.5.0.0-1245                                                                                  
          PigVersion: 0.16.0.2.5.0.0-1245                                                                                
          TezVersion: 0.7.0.2.5.0.0-1245                                                                                  
              UserId: yarn                                                                                                
            FileName: script.pig                                                                                          
           StartedAt: 2016-11-29 15:48:31                                                                                
          FinishedAt: 2016-11-29 15:48:54                                                                                
            Features: HASH_JOIN,GROUP_BY,FILTER                                                                          
Success!
DAG 0:
                                    Name: PigLatin:script.pig-0_scope-0                                                                      
                           ApplicationId: job_1480244541051_0025                                                                              
                      TotalLaunchedTasks: 4                                                                                                  
                           FileBytesRead: 3482                                                                                                
                        FileBytesWritten: 2396                                                                                                
                           HdfsBytesRead: 26083                                                                                              
                        HdfsBytesWritten: 1532                                                                                                
      SpillableMemoryManager spill count: 0                                                                                                  
                Bags proactively spilled: 0                                                                                                  
             Records proactively spilled: 0                                                                                                  
DAG Plan:
Tez vertex scope-52->Tez vertex scope-53,
Tez vertex scope-53->Tez vertex scope-55,
Tez vertex scope-54->Tez vertex scope-55,
Tez vertex scope-55
Vertex Stats:
VertexId Parallelism TotalTasks   InputRecords   ReduceInputRecords  OutputRecords  FileBytesRead FileBytesWritten  HdfsBytesRead HdfsBytesWritten AliasFeatureOutputs
scope-52           1          1           8000                    0            457             32              645          24641                0 a,b,c,d,e
scope-53           1          1              0                   99             99            701              739              0                0 e,hGROUP_BY
scope-54           1          1            100                    0            100             56             1012           1442                0 g,h
scope-55           2          1              0                  199             99           2693                0              0             1532 final_data,hHASH_JOINriskfactor,
Input(s):
Successfully read 100 records (1442 bytes) from: "drivermileage"
Successfully read 8000 records (24641 bytes) from: "geolocation"
Output(s):
Successfully stored 99 records (1532 bytes) in: "riskfactor"
2016-11-29 15:48:54,163 [main] INFO  org.apache.pig.Main - Pig script completed in 29 seconds and 51 milliseconds (29051 ms)
2016-11-29 15:48:54,163 [main] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool
2016-11-29 15:48:54,183 [pool-1-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Shutting down Tez session org.apache.tez.client.TezClient@df073ec
2016-11-29 15:48:54,206 [pool-1-thread-1] INFO  org.apache.tez.client.TezClient - Shutting down Tez Session, sessionName=PigLatin:script.pig, applicationId=application_1480244541051_0025

Any idea? Thanks

avatar
Super Collaborator

@Xavier VAN AUSLOOS, where is the issue? I cannot see any errors. Moreover, if you scroll down, you will see this:

Input(s):
Successfully read 100 records (1442 bytes) from: "drivermileage"
Successfully read 8000 records (24641 bytes) from: "geolocation"
Output(s):
Successfully stored 99 records (1532 bytes) in: "riskfactor"
2016-11-29 15:48:54,163 [main] INFO org.apache.pig.Main - Pig script completed in 29 seconds and 51 milliseconds (29051 ms)

Check your riskfactor table, you should see your data.

avatar

Thanks..it worked finally. I do not know why and how...anyway 🙂