About gkeys

gkeys · ‎07-10-2017

Thank you @Vani, you are correct, the zookeeper namespace distinguishes the URLs (which is not visible from Ambari UI until you copy-paste somewhere) Do you know if this is different for TP (HDP 2.5) vs GA (HDP 2.6) Hive LLAP? I seem to remember the ports being different.

gkeys · ‎07-10-2017

When I implement Hive LLAP the JDBC URLS are identical for HiveServer2 and HiveServer2 interactive (both have port 2181). Why is this / what am I doing wrong?

gkeys · ‎06-30-2017

There are a few, including InvokeHTTP and SplitJson

gkeys · ‎06-30-2017

When emptying queues in NiFi, the following message is presented: "Waiting for destination component to stop". Queues have not emptied after > 10 minutes. Any insights into details of what is happening under the covers? Is this how to recover https://community.hortonworks.com/questions/71525/unable-to-clear-nifi-queue.html ?

gkeys · ‎05-30-2017

@Satish Sarapuri You can use globs anywhere in the path (not just the filename). There are quite many operators for globs (similar to linux) as shown in the above link, so if there is enough in common with the paths you should be able to leverage globs for the differing parts. If none of that works, you could still use the globs with full paths: Source = LOAD '/{path1,path2}' USING PigStorage(,)... where path1 and path2 can be any file path.

gkeys · ‎05-27-2017

@Satish Sarapuri Yes, you can GLOB the filename pattern. This will work work: Source = LOAD '/data/input{1,2}.csv' USING PigStorage(,)... You can use other GLOB patterns. See https://books.google.com/books?id=Nff49D7vnJcC&pg=PA60&lpg=PA60&dq=hdfs+glob&source=bl&ots=IjkvXt9zUn&sig=AKjzNQ77C9BaRgZyqvkJ4YFI7gU&hl=en&sa=X&ved=0ahUKEwirt5_O_I_UAhUE1CYKHTtCDqIQ6AEITzAH#v=onepage&q=hdfs%20glob&f=false

gkeys · ‎05-27-2017

If these two sources have the same schema it is a simple manner of using the UNION operator to do these three steps: Source_1 = LOAD "/data/input1.csv" USING PigStorage(',') ... Source_2 = LOAD "/data/input2.csv" USING PigStorage(',') ... Source = UNION Source_1, Source_2; See these references for elaboration: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#UNION https://www.tutorialspoint.com/apache_pig/apache_pig_union_operator.htm https://stackoverflow.com/questions/10954883/storing-results-of-union-in-pig-in-a-single-file

gkeys · ‎05-23-2017

Script should not be moved to HDFS. Run it from the command line on the edge node. If the edge node has been set up properly pig will be in the classpath and you can run it from anywhere (with absolute path to piggybank location) and most convenient to run it from script location. See this for running pig: http://pig.apache.org/docs/r0.16.0/start.html#run

gkeys · ‎05-23-2017

The following link shows that Insert is an action supported by Hive Atlas Hook (see bottom, Limitations) http://atlas.incubator.apache.org/Bridge-Hive.html Note that CTAS and Load, Import are also supported. Perhaps you can try these approaches.

gkeys · ‎05-22-2017

Please see answer by @Artem Ervits in https://community.hortonworks.com/questions/23816/piggybank-jar-file-does-not-exist.html

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: When I implement Hive LLAP the JDBC URLS are i...

When I implement Hive LLAP the JDBC URLS are ident...

Re: NiFi message when emptying queues: "Waiting fo...

NiFi message when emptying queues: "Waiting for de...

Re: Pig - Load data from two different path

Re: Pig - Load data from two different path

Re: Pig - Load data from two different path

Re: Pig -Piggbank path in HDP

Re: Atlas: How to generate lineage between hive ex...

Re: Pig -Piggbank path in HDP