Created 02-15-2016 04:25 PM
Hi:
After get data with pig from hive, now i am inserting with this command
F = STORE E INTO 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatStorer();
the F has this records:
(STR03CON,3190,2015-12-06 00,9992,2015,12,1) (STS01OON,3081,2015-12-06 00,9154,2015,12,1) (VAO13MOU,3076,2015-12-06 00,9554,2015,12,1) (VMP71MOU,9998,2015-12-06 00,0001,2015,12,11)
and the error is:
2016-02-15 17:22:42,483 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.VisitorException: ERROR 1115: <file store_journey.pig, line 36, column 4> Output Location Validation Failed for: 'journey_pig More info to follow: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer. at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:64) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767) at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443) at org.apache.pig.PigServer.execute(PigServer.java:1356) at org.apache.pig.PigServer.executeBatch(PigServer.java:415) at org.apache.pig.PigServer.executeBatch(PigServer.java:398) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:749) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:502) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer. at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateAlias(HCatBaseStorer.java:612) at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateSchema(HCatBaseStorer.java:514) at org.apache.hive.hcatalog.pig.HCatBaseStorer.doSchemaValidations(HCatBaseStorer.java:495) at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:201) at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:57) ... 29 more
Where i need to put the column name???
Than ks
Created 02-16-2016 07:53 PM
Hi:
Finally it worked like this:
e = load 'hdfs://localhost:8020/tmp/jofi_pig_temp' using PigStorage(',') AS (codtf : chararray,codnrbeenf : chararray, fechaoprcnf : chararray, codinternouof : chararray, year : chararray, month : chararray, frecuencia : int);
Many thanks
Created 02-15-2016 04:27 PM
I am sorry the E contain this:
Created 02-15-2016 04:29 PM
You have to specify the schema in your load statement. See this example
Created 02-15-2016 04:32 PM
@Roberto Sancho I am taking this from previous questions
F = LOAD 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatLoader();
The above statement needs a schema.
For example:
F = LOAD 'journey_pig' USING org.apache.hive.hcatalog.pig.HCatLoader() as (name : chararray, number : int .....)
Created 02-15-2016 04:32 PM
You don't to STORE the relationship into 'F'. Just issue the store command.
Also, please describe 'E' and see the list of column name that you are storing into the hive table. Make sure the column names align and mentioned in lower case in your relationship.
If you don't see the field name in the relationship, then you can define one as follows:
F = FOREACH E GENERATE (chararray)$0 as (col1:chararray), (chararray)$1 as (col2:chararray), .... etc...
and STORE F INTO 'journey_pig' using HCatStorer();
Hope this helps!
Thanks!
Created 02-15-2016 04:33 PM
@Roberto Sancho it is complaining about column #4, check your table schema and explicitly specify the column schema. Please run
describe table;
in Hive and
describe E;
in pig. We can compare. Also you don't need to say "F = STORE E", just issue STORE E INTO 'path' using ...;
Created 02-15-2016 04:40 PM
Hi:
my problem is that, i created F like this:
E = FOREACH D GENERATE FLATTEN(group), COUNT(C);
and y dont know where y put in the flatten the parameter for the colum.
Many thanks egain
Created 02-15-2016 05:31 PM
@Roberto Sancho please post your script and let's close this if you found solution.
Created 02-15-2016 06:02 PM
Hi:
STORE E INTO 'hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp' using PigStorage(','); F = load 'hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp/pig_temp.out' using PigStorage(','); G = foreach A generate $0, $1, $2, $3, $4, $5, $6, $7; dump G; STORE G INTO 'default.journey_pig' USING org.apache.hive.hcatalog.pig.HCatStorer();
this is the hdfs://lnxbig05.cajarural.gcr:8020/tmp/journey_pig_temp file
BDP00SMU,1491,2015-12-06 00,9901,2015,12,1 BDP00SMU,3113,2015-12-06 00,8004,2015,12,1 BDP00SMU,3187,2015-12-06 00,0913,2015,12,1 BDP00SMU,3190,2015-12-06 00,9992,2015,12,1 BDPPM1GP,3008,2015-12-06 00,9521,2015,12,17 BDPPM1HC,3128,2015-12-06 00,8110,2015,12,32 BDPPM1KK,0198,2015-12-06 00,8002,2015,12,1 BDPPM1KK,3008,2015-12-06 00,9521,2015,12,3 BDPPM1KK,3008,2015-12-06 00,9523,2015,12,6
The dump G is:
([COD-NRBE-EN-F#9998,NOMBRE-REGLA-F#SAI_TIP_INC_TRN,FECHA-OPRCN-F#2015-12-06 00:00:01,COD-TX-DI-F#TUX,VALOR-IMP-F#0.00,ID-INTERNO-TERM-TN-F#A0299989,COD-NRBE-EN-FSC-F#9998,COD-CSB-OF-F#0001,COD-TX-F#SAI01COU,COD-INTERNO-UO-F#0001,CANAL#01,COD-INTERNO-UO-FSC-F#0001,COD-IDENTIFICACION-F#,IDENTIFICACION-F#,ID-INTERNO-EMPL-EP-F#99999989,FECHA-CTBLE-F#2015-12-07,NUM-SEC-F#764,ID-EMPL-AUT-F#U028765,COD-CENT-UO-F#,NUM-PARTICION-F#001],,,,,,,) ([COD-NRBE-EN-F#9998,NOMBRE-REGLA-F#TR_IMPUTAC_MPAGO_TRN,FECHA-OPRCN-F#2015-12-06 00:00:06,COD-TX-DI-F#TUX,VALOR-IMP-F#0.00,ID-INTERNO-TERM-TN-F#A0299997,COD-NRBE-EN-FSC-F#9998,COD-CSB-OF-F#0001,COD-TX-F#DVI82OOU,COD-INTERNO-UO-F#0001,CANAL#01,COD-INTERNO-UO-FSC-F#0001,COD-IDENTIFICACION-F#,IDENTIFICACION-F#,ID-INTERNO-EMPL-EP-F#99999998,FECHA-CTBLE-F#2015-12-07,NUM-SEC-F#0,ID-EMPL-AUT-F#,COD-CENT-UO-F#,NUM-PARTICION-F#001],,,,,,,)
the error is when iam going to store de G to the Hive table, here is the file on the hdfs, and also the G variable, i dont know what that mean the error:
2016-02-15 18:59:29,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer. 2016-02-15 18:59:29,529 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.VisitorException: ERROR 1115: <file store_journey.pig, line 48, column 0> Output Location Validation Failed for: 'default.journey_pig More info to follow: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer. at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:64) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1767) at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1443) at org.apache.pig.PigServer.execute(PigServer.java:1356) at org.apache.pig.PigServer.executeBatch(PigServer.java:415) at org.apache.pig.PigServer.executeBatch(PigServer.java:398) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) at org.apache.pig.Main.run(Main.java:502) at org.apache.pig.Main.main(Main.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1115: Column name for a field is not specified. Please provide the full schema as an argument to HCatStorer. at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateAlias(HCatBaseStorer.java:612) at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateSchema(HCatBaseStorer.java:514) at org.apache.hive.hcatalog.pig.HCatBaseStorer.doSchemaValidations(HCatBaseStorer.java:495) at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:201) at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:57) ... 24 more
Created 02-15-2016 07:25 PM
@Roberto Sancho can you run use default; describe journey_pig; in hive. In pig I want you to run describe G;