About LesterMartin

LesterMartin · ‎08-08-2016

I think there is an HCC article on this very topic, but https://martin.atlassian.net/wiki/x/EoC3Ag is a blog post I wrote back in mid-2015 on this subject as well in case it helps any. Good luck!

LesterMartin · ‎08-04-2016

Thx for the additional details. Doesn't look like I'm any help with this one. 😞 If nobody else chimes in and if you've tried to remove and re-add with same results, maybe it is time for opening up a formal support case. Again, hopefully this jogs someone else's memory/experience and helps out here. Good luck!

LesterMartin · ‎08-04-2016

As always, your specific situation, hardware, risk profile, etc are unique to others, but let's revisit a couple of things first. HDFS metadata is physically persisted in "image" and "edits" files. In a HA configuration, the two NN processes are the ones who write out the image files and the JournalNodes (JN) are the processes that persist the edits files. Even without any kind of HBA/RAID configuration (not a bad place to be, but they aren't necessarily bad/wrong either) we get some pretty good spread of recording this information in multiple places (2x copies of image files and 3+x copies of the edits files). The historical rule of thumb was to make sure the NN data (especially before HA) was to write to two local disk and one soft-mounted NFS directory as we simply did not want to ever have a "bunker scene". I, again my strong personal believe, would still suggest that you record to at least two local disks (maybe you decide your RAID approach satisfies this) as well as NFS directory. I'd even follow this along with periodic backups of both the image and edits data files. My thinking on this topic is captured in a blog posting at https://martin.atlassian.net/wiki/x/EoC3Ag. Good luck!

LesterMartin · ‎07-26-2016

Sorry, no, I wasn't suggesting you abandon Pig just that you might need to wrap it with a script or a program to discretely call your generalized Pig script since Pig does not inherently have general purpose looping constructs like we do in other languages. That said, check out my NEW answer and related link which should be able to dynamically do what you want -- and in ONE line of Pig code!! Good luck!

LesterMartin · ‎07-26-2016

Better answer, check out my simple example of using MultiStorage at https://martin.atlassian.net/wiki/x/AgCHB and then assuming that your "Date" field in the original question was the first one in the record format of "a" then the following should get you taken care of. STORE a INTO '/path' USING org.apache.pig.piggybank.storage.MultiStorage( '/path', '0', 'none', '\\t'); This would create folders like /path/2016-07-01 which themselves will have the 1+ "part files" for that given date. You could then use that directory location as your input path for another job. Good luck!!

LesterMartin · ‎07-26-2016

I'm not 100% sure of the sequence of events that got you to this point. If they are easily reproducible, please share the steps here and others may be able to look at it for you. Good luck.

LesterMartin · ‎07-26-2016

The project's wiki pages at key top-pages like https://cwiki.apache.org/confluence/display/Hive/Home and https://cwiki.apache.org/confluence/display/Hive/LanguageManual have been life-savers for me and are often the target of many google search results. I often find many of my answers in individual blog postings that are returned for google searches. And for myself, although I'm now realizing it needs some love, I maintain a "cheat sheet" of links at https://martin.atlassian.net/wiki/x/QIAoAQ that I'd also appreciate and update suggestions on. Good luck!

LesterMartin · ‎07-26-2016

That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly). That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.

LesterMartin · ‎07-24-2016

As you already know, Pig really isn't a general purpose programming that account for such things as this. The "Control Structures" page at http://pig.apache.org/docs/r0.15.0/cont.html gives your the project's recommendations on such things. Generally speaking, a custom script that fires off a generic Pig script, or maybe a Java program, might be your best friend. Good luck!

LesterMartin · ‎07-21-2016

Good to go then, @Johnny Fugers? No further help needed, right?

Online	Offline
Last Visited	‎03-04-2021 02:39 PM

Member Since	‎05-02-2019 12:59 PM
Last Visited	‎03-04-2021 02:39 PM
Posts	319
Kudos received	145

Cloudera Community

Re: How to create partitions on existing Hive tabl...

Re: Copying data from One HBase to another Hbase c...

Re: Number of Concurrent Users on HDP Sandbox in a...

Re: Reason for Hive dependency on PIg during insta...

Re: One datanode nearly full but not the others

Re: can i use the same disk with diff mount points...

Re: view pig error use

Re: Planning hardware for NameNode/Active/Secondar...

Re: Creating a iterativa loop using Apache PIG

Re: Creating a iterativa loop using Apache PIG

Re: view pig error use

Re: Places (besides user doc) that you frequently ...

Re: Does Spark job honor Ranger hive policies?

Re: Creating a iterativa loop using Apache PIG

Re: Apache PIG - When insert STORE function it giv...