About aervits

aervits · ‎01-04-2017

I am mobile and can't comment on your workflows right now but I have example of python2 and python3 WF in my repo https://github.com/dbist/oozie Browse to oozie/apps/ and you will see their respective directories. Use as you wish.

aervits · ‎12-29-2016

@Praveen PentaReddy to close the loop on this, turns out append is the default behavior and if you read the comments in https://issues.apache.org/jira/browse/HIVE-6897 you can see that it is not advisable to force an overwrite of a table via HCatalog. So to turn the feature off completely and not promote "bad" behavior I did the following grunt> sql alter table codeZ set TBLPROPERTIES ('immutable' = 'true'); 2016-12-29 20:49:51,924 [main] INFO org.apache.pig.tools.grunt.GruntParser - Going to run hcat command: alter table codeZ set TBLPROPERTIES ('immutable' = 'true'); OK Time taken: 2.041 seconds grunt> a = load 'sample_07' using org.apache.hive.hcatalog.pig.HCatLoader(); 2016-12-29 20:51:00,125 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 2016-12-29 20:51:00,163 [main] INFO hive.metastore - Connected to metastore. grunt> b = load 'sample_08' using org.apache.hive.hcatalog.pig.HCatLoader(); grunt> c = join b by code, a by code; grunt> d = foreach c generate $0 as code, $1 as description, $2 as total_emp, $3 as salary; grunt> store d into 'codeZ' using org.apache.hive.hcatalog.pig.HCatStorer(); 2016-12-29 20:52:26,894 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: <line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow: org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez Details at logfile: /home/guest/pig_1483044536026.log grunt> quit 2016-12-29 20:56:25,336 [main] INFO org.apache.pig.Main - Pig script completed in 7 minutes, 29 seconds and 397 milliseconds (449397 ms) [guest@sandbox ~]$ less /home/guest/pig_1483044536026.log and snippet from log ERROR 6000: <line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow: org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias d at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1778) at org.apache.pig.PigServer.registerQuery(PigServer.java:707) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1075) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:231) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:206) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:566) at org.apache.pig.Main.main(Main.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 6000: <line 5, column 0> Output Location Validation Failed for: 'codeZ More info to follow: org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:95) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:212) at org.apache.pig.PigServer$Graph.compile(PigServer.java:1851) at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1527) at org.apache.pig.PigServer.execute(PigServer.java:1440) at org.apache.pig.PigServer.access$500(PigServer.java:118) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1773) ... 14 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.codez at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.handleDuplicatePublish(FileOutputFormatContainer.java:206) at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.checkOutputSpecs(FileOutputFormatContainer.java:121) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:65) at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:69) in other words, it is not advisable to overwrite via HCatStorer as Hive handles append/overwrite. The only workaround here is to use a temporary table as I suggested earlier.

aervits · ‎12-29-2016

@sudarshan kumar did that answer your question?

aervits · ‎12-29-2016

@rathna mohan I have not tested this but it may be possible to use HBase API to achieve what you're asking, https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.html. I am just guessing and I don't think it will be a trivial effort as you're going to have to access Phoenix from HBase API. Again, I hesitate to suggest this route as effort to develop this functionality can be quite involved.

aervits · ‎12-28-2016

Offset functionality is available in Phoenix as of 4.8, there are no plans to backport to to 4.4. What is limiting you from upgrading to the latest?

aervits · ‎12-26-2016

capacity issues indicate you may be sending too much data or your destination cannot keep up with that much data. That's one of the bigger reasons Apache Nifi is a superior tool as it has backpressure built in as well as visual guides that you may have some kind of capacity issues. Try to throttle down how much data you're sending to the memory channel or see if you can expire stale data. Oh yeah, in Nifi you can expire data too.

aervits · ‎12-26-2016

HDC 1. HDC is backed by S3, that's the primary way to offer HA. There are no options to deploy HA namenode, RS, etc as clusters are meant to be terminated once used. 2. Plans for enterprise support will be announced in Q1. Cloudbreak 1. Yes, you can set up alerts to trigger upscale or downscale based on utilization. 2. Yes you can deploy HA across all HA-compatible components. 3. I'd say Cloudbreak makes things easier overall but one could argue you lose some control vs. manual install. Also there might be a short learning curve to learn Cloudbreak but everything is relative. On the flip side, if you learn Cloudbreak once, you can now have a choice to deploy identical infrastructure on another cloud provider. It's all about choice. I recommend you reach out to a local hwx representative who can pull the right team of experts to work through your pains and adivse a solution.

aervits · ‎12-25-2016

Thwre are no options to install new services with HDC. It servers only prescriptive use cases and it is not meant to be extended with additional services. For long running clusters in AWS with configurable components and options, please consider using Cloudbreak, HDC was built on Cloudbreak technology. Finally, you can also stand up HDP with AWS by launching AMI and do a manual or blueprint install, this option also allowed for all HDP components we offer.

aervits · ‎12-23-2016

Was I able to answer your question or do you need further clarification?

aervits · ‎12-22-2016

@Anbu Eswaran please select the best answer as we don't know which answer helped most.

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: How to run Oozie Job with Python Script in San...

Re: HCatStorer is not overwriting a Hive table. or...

Re: How to set No of output file per reducer in Cu...

Re: Is there any way to do pagination in phoenix 4...

Re: Is there any way to do pagination in phoenix 4...

Re: getting error while submitting flume with hive...

Re: Services available out of the box in Hortonwor...

Re: Services available out of the box in Hortonwor...

Re: How to set No of output file per reducer in Cu...

Re: how to use loop in hive