Member since
09-20-2016
15
Posts
1
Kudos Received
0
Solutions
03-15-2017
06:44 PM
Hi - You need to write a shell script to accomplish this requirement. Step 1. you need to find the Max value of your Delta/Incremental field value and assign this value to a variable. 2. In this Scenario, you need to use the Sqoop Import statement instead of Creation. 3. In the same script you, need to use the Sqoop Import with --query in where clause put the condition and read your variable. This will solves your problem...
... View more
03-14-2017
04:16 PM
Hi - could you share your entire UNIX script that using sqoop Statement..
... View more
10-13-2016
06:32 AM
It Producing the Output as expecting.. unfortunately, I cann't load multiple files, I need to run multiple load statements.
... View more
10-07-2016
03:28 AM
Hi, You can use the windows functions in Hive using OVER (PARTITION BY code ORDER BY code DESC) for your reference and I am not sure the performance but you can tweak that also.
... View more
10-03-2016
03:57 AM
Any Help on this Issue is much appreciated...
... View more
09-27-2016
02:44 AM
I need to use the Glob Syntax with the OrcStorage(). Its worked for me when I was using the PigStorage() and Unfortunatley teh same is not working with the OrcStorage()
... View more
09-26-2016
07:12 AM
I am facing the Same Error: in either cases ( used /app/hive .... path and complete path). Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://XXXX/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}} -------------------------------------------------------------------------------------------------------------------------------------------- grunt> FILE_DATA = LOAD '/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{01,02}}' using OrcStorage();
grunt> U = limit FILE_DATA 10;
grunt> dump U;
2016-09-26 00:15:25,708 [main] INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 298835 for x186366 on ha-hdfs:XXXX
2016-09-26 00:15:25,729 [main] INFO org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://XXXX; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:XXXX, Ident: (HDFS_DELEGATION_TOKEN token 298835 for x186366)
2016-09-26 00:15:25,732 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: LIMIT
2016-09-26 00:15:25,760 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2016-09-26 00:15:25,787 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-09-26 00:15:25,921 [main] INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 298836 for x186366 on ha-hdfs:XXXX
2016-09-26 00:15:25,921 [main] INFO org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://XXXX; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:XXXX, Ident: (HDFS_DELEGATION_TOKEN token 298836 for x186366)
2016-09-26 00:15:25,948 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-09-26 00:15:26,015 [main] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-09-26 00:15:26,076 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2081: Unable to setup the load function.
Details at logfile: /home/x186366/PIG/pig_1474863256378.log
grunt> FILE_DATA = LOAD '/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}' using OrcStorage();
grunt> dump FILE_DATA;
2016-09-26 00:18:38,271 [main] INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 298842 for x186366 on ha-hdfs:XXXX
2016-09-26 00:18:38,271 [main] INFO org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://XXXX; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:XXXX, Ident: (HDFS_DELEGATION_TOKEN token 298842 for x186366)
2016-09-26 00:18:38,272 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2016-09-26 00:18:38,291 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-09-26 00:18:38,292 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2016-09-26 00:18:38,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2016-09-26 00:18:38,320 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2016-09-26 00:18:38,320 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2016-09-26 00:18:38,653 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://qcr-hadoop-m003.oss.ads:8188/ws/v1/timeline/
2016-09-26 00:18:38,998 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2016-09-26 00:18:39,002 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2016-09-26 00:18:39,002 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2016-09-26 00:18:39,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/pig-0.14.0.2.2.4.2-2-core-h2.jar to DistributedCache through /tmp/temp-706185898/tmp-1416785850/pig-0.14.0.2.2.4.2-2-core-h2.jar
2016-09-26 00:18:39,117 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/hive-common-0.14.0.2.2.4.2-2.jar to DistributedCache through /tmp/temp-706185898/tmp1473636815/hive-common-0.14.0.2.2.4.2-2.jar
2016-09-26 00:18:39,133 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/kryo-2.22.jar to DistributedCache through /tmp/temp-706185898/tmp1197094010/kryo-2.22.jar
2016-09-26 00:18:39,152 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/hive-serde-0.14.0.2.2.4.2-2.jar to DistributedCache through /tmp/temp-706185898/tmp-91002594/hive-serde-0.14.0.2.2.4.2-2.jar
2016-09-26 00:18:39,186 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/hive-exec-0.14.0.2.2.4.2-2-core.jar to DistributedCache through /tmp/temp-706185898/tmp-169373252/hive-exec-0.14.0.2.2.4.2-2-core.jar
2016-09-26 00:18:39,202 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/h2/hive-shims-0.23-0.14.0.2.2.4.2-2.jar to DistributedCache through /tmp/temp-706185898/tmp-278707777/hive-shims-0.23-0.14.0.2.2.4.2-2.jar
2016-09-26 00:18:39,215 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/hive-shims-common-secure-0.14.0.2.2.4.2-2.jar to DistributedCache through /tmp/temp-706185898/tmp-909903736/hive-shims-common-secure-0.14.0.2.2.4.2-2.jar
2016-09-26 00:18:39,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/hive-shims-common-0.14.0.2.2.4.2-2.jar to DistributedCache through /tmp/temp-706185898/tmp815975405/hive-shims-common-0.14.0.2.2.4.2-2.jar
2016-09-26 00:18:39,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-706185898/tmp-808702139/automaton-1.11-8.jar
2016-09-26 00:18:39,257 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-706185898/tmp780911667/antlr-runtime-3.4.jar
2016-09-26 00:18:39,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/hadoop/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp-706185898/tmp-718792826/guava-11.0.2.jar
2016-09-26 00:18:39,296 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/hdp/2.2.4.2-2/hadoop-mapreduce/joda-time-2.7.jar to DistributedCache through /tmp/temp-706185898/tmp-36993951/joda-time-2.7.jar
2016-09-26 00:18:39,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2016-09-26 00:18:39,372 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2016-09-26 00:18:39,493 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://qcr-hadoop-m003.oss.ads:8188/ws/v1/timeline/
2016-09-26 00:18:39,515 [JobControl] INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 298843 for x186366 on ha-hdfs:XXXX
2016-09-26 00:18:39,515 [JobControl] INFO org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://XXXX; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:XXXX, Ident: (HDFS_DELEGATION_TOKEN token 298843 for x186366)
2016-09-26 00:18:39,807 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2016-09-26 00:18:39,861 [JobControl] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2016-09-26 00:18:39,863 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/x186366/.staging/job_1474117246017_13074
2016-09-26 00:18:39,868 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: File does not exist: hdfs://XXXX/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://XXXX/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:961)
at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
2016-09-26 00:18:39,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1474117246017_13074
2016-09-26 00:18:39,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases FILE_DATA
2016-09-26 00:18:39,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: FILE_DATA[5,12] C: R:
2016-09-26 00:18:39,882 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2016-09-26 00:18:44,895 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-09-26 00:18:44,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1474117246017_13074 has failed! Stop running all dependent jobs
2016-09-26 00:18:44,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-09-26 00:18:44,991 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://qcr-hadoop-m003.oss.ads:8188/ws/v1/timeline/
2016-09-26 00:18:45,051 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1474117246017_13074. Redirecting to job history server.
2016-09-26 00:18:45,115 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
2016-09-26 00:18:45,115 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2016-09-26 00:18:45,116 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0.2.2.4.2-2 0.14.0.2.2.4.2-2 x186366 2016-09-26 00:18:38 2016-09-26 00:18:45 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1474117246017_13074 FILE_DATA MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: File does not exist: hdfs://XXXX/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
at java.lang.Thread.run(Thread.java:745)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://XXXX/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:961)
at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
hdfs://XXXX/tmp/temp-706185898/tmp-1241830479,
Input(s):
Failed to read data from "/apps/hive/warehouse/us_rat.db/int_sr_pr_dtl_orc/pa_int_sr_dtl_start_dt={2016-09-{0[1-5]}}"
Output(s):
Failed to produce result in "hdfs://XXXX/tmp/temp-706185898/tmp-1241830479"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1474117246017_13074
2016-09-26 00:18:45,116 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-09-26 00:18:45,120 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias FILE_DATA. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
... View more
09-23-2016
09:22 AM
Code: grunt> FILE_DATA = LOAD 'hdfs://XXXXXX/apps/hive/warehouse/us_ra.db/int_detail_orc/pa_srvc_prov_data_detail_start_dt={2016-09-{01,02}}' using OrcStorage();
grunt> SAMPLE_DATA = LIMIT FILE_DATA 10;
grunt> DUMP SAMPLE_DATA;
Complete Error Information: Pig Stack Trace
---------------
ERROR 2081: Unable to setup the load function.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias SAMPLE_DATA
at org.apache.pig.PigServer.openIterator(PigServer.java:935)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias SAMPLE_DATA
at org.apache.pig.PigServer.storeEx(PigServer.java:1038)
at org.apache.pig.PigServer.store(PigServer.java:997)
at org.apache.pig.PigServer.openIterator(PigServer.java:910)
... 13 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: SAMPLE_DATA: Store
(hdfs://XXXXXX/tmp/temp-1995198811/tmp-1809220655:org.apache.pig.impl.io.InterStorage) - scope-2 Operator Key: scope-2
): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: SAMPLE_DATA: Limit - scope
-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load functio
n.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:31
6)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:
159)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:157)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:278)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
... 15 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: SAMPLE_DATA: Limit
- scope-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load
function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:31
6)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNextTuple(POLimit.java:
122)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:30
7)
... 22 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:13
1)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:30
7)
... 24 more
Caused by: org.apache.hadoop.mapred.InvalidInputException: File does not exist: hdfs://XXXXXX/apps/hive/warehouse/us_ra.db/int_detail_orc/pa_srvc_prov_data_detail_start_dt={2016-09-{01,02}}
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:961)
at org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat.getSplits(OrcNewInputFormat.java:121)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:99)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:12
7)
... 25 more
================================================================================
... View more
09-23-2016
05:49 AM
Need to Process the files from the date 2016-09-01 to 2016-09-20
... View more
09-23-2016
05:47 AM
1 Kudo
Hi - Help Needed As Soon... I have load statement using Glob Syntax that Throwing the Error.. FILE_DATA = LOAD 'hdfs://XXXXX/apps/hive/warehouse/us_rat.db/aa/pa_int_dtl_start_dt={2016-09{0[1-9]}}' using OrcStorage(); ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 13, column 7> Syntax error, unexpected symbol at or near '=' *** the '=' sign is exist in path the complete path is: hdfs://XXXXX/apps/hive/warehouse/us_rat.db/aa/pa_int_dtl_start_dt=2016-09-01 Please share your Inputs
... View more
- Tags:
- Data Processing
- Pig
Labels:
09-22-2016
07:00 AM
Hi, I have a use case that, Need to Bring all Yesterday's Files data in pig Relation. All these files are placed in a DATETIMESTAMP (Yesterday Date timestamp) partition folder. I have tried defining a Parameter assigning the calculated Yesterday in my script. %declare PREV_DATE ToString(SubtractDuration(CurrentTime(),'P1D'),'YYYY-MM-dd'); when I am using this Parameter ( $PREV_DATE ) in my LOAD statement, the parameter value printing as ToString(SubtractDuration(CurrentTime(),'P1D'),'YYYY-MM-dd') instead the date ( ex: 2016-08-12). How we can accomplish this, please share your thoughts
... View more
Labels:
09-22-2016
03:57 AM
Thank you for the detailed Information to debug the code.... Its veered situation for me, the same code yesterday it was not executed for me the same code (without any modification on the code) successfully executed and published the data.
... View more
09-20-2016
11:42 AM
Hi, I am Using the below code to perform the Order Operation but it throwing the error that "Invalid field projection" where As the The Relation Has the column.
grunt> byts = ORDER B BY JB_DLT::job_id DESC; ERROR 2016-09-20 07:32:56,815 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 99, column 18> Invalid field projection. Projected field [JB_DLT::job_id] does not exist in schema: group:tuple(JB_ASGNMNT::JB_DLT::job_id:bigdecimal,JOB_ASNMNT_STS_DLT::job_assignment_status_cd:chararray),a:bag{:tuple(JB_ASGNMNT::JB_DLT::job_id:bigdecimal,JOB_ASNMNT_STS_DLT::job_assignment_status_cd:chararray,JOB_ASNMNT_STS_DLT::job_assignment_status_id:bigdecimal,JOB_ASNMNT_STS_DLT::actual_status_dt:datetime)}. grunt> describe B;
B: {group: (JB_ASGNMNT::JB_DLT::job_id: bigdecimal,JOB_ASNMNT_STS_DLT::job_assignment_status_cd: chararray),a: {(JB_ASGNMNT::JB_DLT::job_id: bigdecimal,JOB_ASNMNT_STS_DLT::job_assignment_status_cd: chararray,JOB_ASNMNT_STS_DLT::job_assignment_status_id: bigdecimal,JOB_ASNMNT_STS_DLT::actual_status_dt: datetime)}} How to read this column data in the ORDER BY Clause. Please share your thoughts.. ========================= Here is The Entire Code and Sample Data ===================== J_B = LOAD 'ABCD' USING org.apache.hive.hcatalog.pig.HCatLoader();
J_B_SQ = FOREACH J_B GENERATE wo_id,jb_id,ls_upd_ts;
JB_DLT = FILTER J_B_SQ BY ls_upd_ts >= SubtractDuration(CurrentTime(),'P1D');
J_A_TBL = LOAD 'EFGH' USING org.apache.hive.hcatalog.pig.HCatLoader();
J_A_SQ = FOREACH J_A_TBL GENERATE jb_id,J_A_ID,ToString(eff_ts,'YYYY-MM-dd') AS DLT_CLMN;
J_A_DLT = FILTER J_A_SQ BY DLT_CLMN == '9999-12-31';
J_A_STS_TBL = LOAD 'IJKL' USING org.apache.hive.hcatalog.pig.HCatLoader();
J_A_STS_SQ = FOREACH J_A_STS_TBL GENERATE J_A_ID,j_ast_sts_id,J_A_S_CD,wk_cd,e_st_ts,ToString(eff_ts,'YYYY-MM-dd') AS DLT_CLMN;
J_A_STS_DLT = FILTER J_A_STS_SQ BY DLT_CLMN == '9999-12-31';
J_ASGN = JOIN JB_DLT by job_id, J_A_DLT by job_id;
J_ASGN_STS = JOIN J_ASGN by J_A_DLT::J_A_ID,J_A_STS_DLT by J_A_ID;
a = FOREACH J_ASGN_STS GENERATE JB_DLT::job_id,(J_A_STS_DLT::J_A_S_CD is null ? 'XXXX' : J_A_STS_DLT::J_A_S_CD) AS J_A_S_CD,J_A_STS_DLT::j_a_s_id, J_A_STS_DLT::actual_status_dt;
B = GROUP a BY (JB_DLT::job_id,J_A_STS_DLT::J_A_S_CD);
C = FOREACH B {byts = ORDER B BY JB_DLT::job_id DESC;
newest = LIMIT byts 1;
GENERATE FLATTEN(newest);
};
Sample Data: jb_id J_A_ID J_A_S_CD 123 421 DIS 123 421 REJ 123 421T EN 536 386 ACC 536 386 COM 536 386 DIS OUT PUT Will be the Same
... View more
- Tags:
- Data Processing
- Pig
Labels:
09-20-2016
10:14 AM
I am Using PIG Version : 0.14, My script has 20 line code and it performing Data Extraction, null check,join ( condition columns data type same) and group by( 2 columns and enclosed them in paranthasis) and also order by to see last in record, but Error showing at 60 line, which that line is not in my script and don't have knowledge on that error. Some one please help me to fix this issue.. Thank you for your time...
... View more
Labels: