Member since
05-02-2019
319
Posts
145
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7201 | 06-03-2019 09:31 PM | |
1747 | 05-22-2019 02:38 AM | |
2209 | 05-22-2019 02:21 AM | |
1386 | 05-04-2019 08:17 PM | |
1687 | 04-14-2019 12:06 AM |
08-08-2016
02:07 AM
I think there is an HCC article on this very topic, but https://martin.atlassian.net/wiki/x/EoC3Ag is a blog post I wrote back in mid-2015 on this subject as well in case it helps any. Good luck!
... View more
08-04-2016
06:23 PM
Thx for the additional details. Doesn't look like I'm any help with this one. 😞 If nobody else chimes in and if you've tried to remove and re-add with same results, maybe it is time for opening up a formal support case. Again, hopefully this jogs someone else's memory/experience and helps out here. Good luck!
... View more
08-04-2016
02:13 PM
1 Kudo
As always, your specific situation, hardware, risk profile, etc are unique to others, but let's revisit a couple of things first. HDFS metadata is physically persisted in "image" and "edits" files. In a HA configuration, the two NN processes are the ones who write out the image files and the JournalNodes (JN) are the processes that persist the edits files. Even without any kind of HBA/RAID configuration (not a bad place to be, but they aren't necessarily bad/wrong either) we get some pretty good spread of recording this information in multiple places (2x copies of image files and 3+x copies of the edits files). The historical rule of thumb was to make sure the NN data (especially before HA) was to write to two local disk and one soft-mounted NFS directory as we simply did not want to ever have a "bunker scene". I, again my strong personal believe, would still suggest that you record to at least two local disks (maybe you decide your RAID approach satisfies this) as well as NFS directory. I'd even follow this along with periodic backups of both the image and edits data files. My thinking on this topic is captured in a blog posting at https://martin.atlassian.net/wiki/x/EoC3Ag. Good luck!
... View more
07-26-2016
04:54 PM
Sorry, no, I wasn't suggesting you abandon Pig just that you might need to wrap it with a script or a program to discretely call your generalized Pig script since Pig does not inherently have general purpose looping constructs like we do in other languages. That said, check out my NEW answer and related link which should be able to dynamically do what you want -- and in ONE line of Pig code!! Good luck!
... View more
07-26-2016
04:52 PM
2 Kudos
Better answer, check out my simple example of using MultiStorage at https://martin.atlassian.net/wiki/x/AgCHB and then assuming that your "Date" field in the original question was the first one in the record format of "a" then the following should get you taken care of. STORE a INTO '/path'
USING org.apache.pig.piggybank.storage.MultiStorage(
'/path', '0', 'none', '\\t'); This would create folders like /path/2016-07-01 which themselves will have the 1+ "part files" for that given date. You could then use that directory location as your input path for another job. Good luck!!
... View more
07-26-2016
03:34 PM
I'm not 100% sure of the sequence of events that got you to this point. If they are easily reproducible, please share the steps here and others may be able to look at it for you. Good luck.
... View more
07-26-2016
03:32 PM
The project's wiki pages at key top-pages like https://cwiki.apache.org/confluence/display/Hive/Home and https://cwiki.apache.org/confluence/display/Hive/LanguageManual have been life-savers for me and are often the target of many google search results. I often find many of my answers in individual blog postings that are returned for google searches. And for myself, although I'm now realizing it needs some love, I maintain a "cheat sheet" of links at https://martin.atlassian.net/wiki/x/QIAoAQ that I'd also appreciate and update suggestions on. Good luck!
... View more
07-26-2016
03:23 PM
5 Kudos
That sounds like all is working as designed/implemented since Ranger does not currently (as of HDP 2.4) have a supported plug-in for Spark and knowing that when spark is reading Hive tables that it really isn't going through the "front door" of Hive to actual run queries (it is reading these files from HDFS directly). That said, the underlying HDFS authorization policies (either w/or w/o using Ranger) will be honored if they are in-place.
... View more
07-24-2016
04:32 PM
As you already know, Pig really isn't a general purpose programming that account for such things as this. The "Control Structures" page at http://pig.apache.org/docs/r0.15.0/cont.html gives your the project's recommendations on such things. Generally speaking, a custom script that fires off a generic Pig script, or maybe a Java program, might be your best friend. Good luck!
... View more
07-21-2016
06:35 PM
Good to go then, @Johnny Fugers? No further help needed, right?
... View more