About gkeys

gkeys · ‎09-13-2016

@Timothy Spann very effective answer, but similar information as @Randy Gelhausen and he was first in.

gkeys · ‎09-13-2016

This article shows how to use a list of urls in an external file to iterate InvokeHttp https://community.hortonworks.com/content/kbentry/48816/nifi-to-ingest-and-transform-rss-feeds-to-hdfs-usi.html You can schedule GetFile to run once per day, week, etc. If errors at the end of the flow inserting into a db, you can configure to ignore failure.

gkeys · ‎09-12-2016

I am trying to create phoenix interpreter using %jdbc in Zeppelin using 2.5 and am not succeeding. Steps are: Log into Zeppelin (sandbox 2.5) Create new interpreter as follows restart (just to be paranoid) go to my notebook and bind interpreter when I run with %jdbc(phoenix) I get Prefix not found. when I run it with %jdbc.phoenix I get jdbc.phoenix interpreter not found What am I missing?

gkeys · ‎09-12-2016

Agree with @mqureshi @Constantin Stanca Would like to add the theme that compression is a strategy and usually not a universal yes or no, or this codec or that. Important questions to ask for your data are: Will it be processed frequently, rarely or never (cold storage)? How critical is performance when it is processed? Which leads to: Which file format/compression codec if any for each dataset? The following are good references for compression and file format strategies (takes some thinking and evaluating): http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2 http://comphadoop.weebly.com/ http://www.dummies.com/programming/big-data/hadoop/hadoop-for-dummies/ After formulating a strategy, think about dividing your hdfs filepaths into zones in accordance with your strategy.

gkeys · ‎09-12-2016

@Saumitra Buragohain could you help out here? Could use your expertise 🙂

gkeys · ‎09-12-2016

I have heard that full-dev-platform is being deprecated and vagrant-multinode should be used for a development envt instead: https://github.com/apache/incubator-metron/tree/master/metron-deployment/vagrant/multinode-vagrant This is very resource intensive, so for dev the best option is: https://github.com/apache/incubator-metron/tree/master/metron-deployment/vagrant/quick-dev-platform Also, ansible should be installed latest version and downgraded as follows: https://cwiki.apache.org/confluence/display/METRON/Downgrade+Ansible

gkeys · ‎09-12-2016

Please be sure to follow these instructions https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4-Win/bk_HDP_Install_Win/content/LZOCompression.html You can do step 3 from the Ambari web UI. Also, note that steps 1-2 need to be done on each node in the cluster.

gkeys · ‎09-11-2016

@Mohan V Issue of jar version imcompatibility. You need to use the following newer versions of elephant bird (and not the older version) REGISTER elephant-bird-core-4.1.jar REGISTER elephant-bird-pig-4.1.jar REGISTER elephant-bird-hadoop-compat-4.1.jar I tested it with your code and sample and it works. You can get the jars at: http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-core-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-pig-4.1.jar.zip http://www.java2s.com/Code/JarDownload/elephant/elephant-bird-hadoop-compat-4.1.jar.zip Regarding DESCRIBE working but DUMP causing the issue: DUMP runs the map-reduce program and DESCRIBE does not.

gkeys · ‎09-11-2016

@Mohan V Very glad to see you solved it yourself by debugging -- it is the best way to learn and improve your skills 🙂

gkeys · ‎09-10-2016

There is a lot going on here -- when writing a complex script like this, the following approach is useful to build and debug: run locally against a small subset of records (pig -x local -f <scriptOnLocalFileSystem>.pig). This makes each instance of the script run faster. build each statement line by line until you get to the failure statement (run the first statement, add the second and run, etc until it fails). When it fails you need to focus on the last statement and fix it. These steps are good for finding grammar issues (which it looks like you have based on the error statement). If you also want to make sure your data is being processed correctly, put a DUMP statement after each line during each iteration. That way you can inspect the results of each statement If using inline statements like your grouped = statement, separate out at first until it works. This makes the issue easier to isolate. Let me know how that goes.

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Trying to create phoenix interpreter using %jd...

Re: How to stop the iterations of a NIFI processor...

Trying to create phoenix interpreter using %jdbc i...

Re: Lz0 is enabled now what?

Re: Are there any special considerations or optimi...

Re: Error in Metron full-dev-platfrom installation

Re: Pig output STORE using elephantbird.pig.store....

Re: ERROR 1066: Unable to open iterator for alias-...

Re: PIG script Error

Re: PIG script Error