Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pig and hive store

avatar
Master Collaborator

Hi:

After get data with pig from hive, now i am inserting with this command

F = STORE E INTO 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatStorer();

the F has this records:

(STR03CON,3190,2015-12-06 00,9992,2015,12,1)
(STS01OON,3081,2015-12-06 00,9154,2015,12,1)
(VAO13MOU,3076,2015-12-06 00,9554,2015,12,1)
(VMP71MOU,9998,2015-12-06 00,0001,2015,12,11)

and the error is:

2016-02-15 17:22:42,483 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.VisitorException: ERROR 1115:
<file store_journey.pig, line 36, column 4> Output Location Validation Failed for: 'journey_pig More info to follow:
Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer.
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:64)
        at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
        at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
        at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767)
        at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443)
        at org.apache.pig.PigServer.execute(PigServer.java:1356)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:749)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:502)
        at org.apache.pig.Main.main(Main.java:177)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer.
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateAlias(HCatBaseStorer.java:612)
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateSchema(HCatBaseStorer.java:514)
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.doSchemaValidations(HCatBaseStorer.java:495)
        at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:201)
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:57)
        ... 29 more


Where i need to put the column name???

Than ks

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi:

Finally it worked like this:

e = load 'hdfs://localhost:8020/tmp/jofi_pig_temp' using PigStorage(',') AS (codtf : chararray,codnrbeenf : chararray, fechaoprcnf : chararray, codinternouof : chararray, year : chararray, month : chararray, frecuencia : int);

Many thanks

View solution in original post

12 REPLIES 12

avatar
Master Collaborator

I am sorry the E contain this:

  1. (STR03CON,3190,2015-12-0600,9992,2015,12,1)
  2. (STS01OON,3081,2015-12-0600,9154,2015,12,1)
  3. (VAO13MOU,3076,2015-12-0600,9554,2015,12,1)
  4. (VMP71MOU,9998,2015-12-0600,0001,2015,12,11)

avatar
Master Mentor
@Roberto Sancho

You have to specify the schema in your load statement. See this example

avatar
Master Mentor

@Roberto Sancho I am taking this from previous questions

F = LOAD 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatLoader();

The above statement needs a schema.

For example:

F = LOAD 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatLoader() as (name : chararray, number : int .....)

avatar
Expert Contributor

You don't to STORE the relationship into 'F'. Just issue the store command.

Also, please describe 'E' and see the list of column name that you are storing into the hive table. Make sure the column names align and mentioned in lower case in your relationship.

If you don't see the field name in the relationship, then you can define one as follows:

F = FOREACH E GENERATE (chararray)$0 as (col1:chararray), (chararray)$1 as (col2:chararray), .... etc...

and STORE F INTO 'journey_pig' using HCatStorer();

Hope this helps!

Thanks!

avatar
Master Mentor

@Roberto Sancho it is complaining about column #4, check your table schema and explicitly specify the column schema. Please run

describe table; 

in Hive and

describe E; 

in pig. We can compare. Also you don't need to say "F = STORE E", just issue STORE E INTO 'path' using ...;

avatar
Master Collaborator

Hi:

my problem is that, i created F like this:

E = FOREACH D GENERATE
    FLATTEN(group),
    COUNT(C);


and y dont know where y put in the flatten the parameter for the colum.

Many thanks egain

avatar
Master Mentor

@Roberto Sancho please post your script and let's close this if you found solution.

avatar
Master Collaborator

Hi:

STORE E INTO 'hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp' using PigStorage(',');
F = load 'hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp/pig_temp.out' using PigStorage(',');
G = foreach A generate $0, $1, $2, $3, $4, $5, $6, $7;
dump G;
STORE G INTO 'default.journey_pig' USING  org.apache.hive.hcatalog.pig.HCatStorer();

this is the hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp file

BDP00SMU,1491,2015-12-06 00,9901,2015,12,1
BDP00SMU,3113,2015-12-06 00,8004,2015,12,1
BDP00SMU,3187,2015-12-06 00,0913,2015,12,1
BDP00SMU,3190,2015-12-06 00,9992,2015,12,1
BDPPM1GP,3008,2015-12-06 00,9521,2015,12,17
BDPPM1HC,3128,2015-12-06 00,8110,2015,12,32
BDPPM1KK,0198,2015-12-06 00,8002,2015,12,1
BDPPM1KK,3008,2015-12-06 00,9521,2015,12,3
BDPPM1KK,3008,2015-12-06 00,9523,2015,12,6

The dump G is:

([COD-NRBE-EN-F#9998,NOMBRE-REGLA-F#SAI_TIP_INC_TRN,FECHA-OPRCN-F#2015-12-06 00:00:01,COD-TX-DI-F#TUX,VALOR-IMP-F#0.00,ID-INTERNO-TERM-TN-F#A0299989,COD-NRBE-EN-FSC-F#9998,COD-CSB-OF-F#0001,COD-TX-F#SAI01COU,COD-INTERNO-UO-F#0001,CANAL#01,COD-INTERNO-UO-FSC-F#0001,COD-IDENTIFICACION-F#,IDENTIFICACION-F#,ID-INTERNO-EMPL-EP-F#99999989,FECHA-CTBLE-F#2015-12-07,NUM-SEC-F#764,ID-EMPL-AUT-F#U028765,COD-CENT-UO-F#,NUM-PARTICION-F#001],,,,,,,)
([COD-NRBE-EN-F#9998,NOMBRE-REGLA-F#TR_IMPUTAC_MPAGO_TRN,FECHA-OPRCN-F#2015-12-06 00:00:06,COD-TX-DI-F#TUX,VALOR-IMP-F#0.00,ID-INTERNO-TERM-TN-F#A0299997,COD-NRBE-EN-FSC-F#9998,COD-CSB-OF-F#0001,COD-TX-F#DVI82OOU,COD-INTERNO-UO-F#0001,CANAL#01,COD-INTERNO-UO-FSC-F#0001,COD-IDENTIFICACION-F#,IDENTIFICACION-F#,ID-INTERNO-EMPL-EP-F#99999998,FECHA-CTBLE-F#2015-12-07,NUM-SEC-F#0,ID-EMPL-AUT-F#,COD-CENT-UO-F#,NUM-PARTICION-F#001],,,,,,,)

the error is when iam going to store de G to the Hive table, here is the file on the hdfs, and also the G variable, i dont know what that mean the error:

2016-02-15 18:59:29,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer.
2016-02-15 18:59:29,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.VisitorException: ERROR 1115:
<file store_journey.pig, line 48, column 0> Output Location Validation Failed for: 'default.journey_pig More info to follow:
Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer.
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:64)
        at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
        at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212)
        at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767)
        at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443)
        at org.apache.pig.PigServer.execute(PigServer.java:1356)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:415)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:398)
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:502)
        at org.apache.pig.Main.main(Main.java:177)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer.
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateAlias(HCatBaseStorer.java:612)
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateSchema(HCatBaseStorer.java:514)
        at org.apache.hive.hcatalog.pig.HCatBaseStorer.doSchemaValidations(HCatBaseStorer.java:495)
        at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:201)
        at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:57)
        ... 24 more


avatar
Master Mentor

@Roberto Sancho can you run use default; describe journey_pig; in hive. In pig I want you to run describe G;