Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7563 | 08-12-2016 01:02 PM | |
| 2763 | 08-08-2016 10:00 AM | |
| 3776 | 08-03-2016 04:44 PM | |
| 7353 | 08-03-2016 02:53 PM | |
| 1903 | 08-01-2016 02:38 PM |
02-29-2016
01:19 PM
1 Kudo
It is almost impossible to get a steadily increasing not interrupted sequence number in a parallel data warehouse like Hive. There is the following UDF but you would have to restrict the number of mappers to 1 to make this work. https://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java There are some tricks like Accumulators in Spark for example that can do similar things ( give unique numbers at least if not steadily increasing ) but not in Hive as far as I know.
... View more
02-28-2016
07:09 PM
1 Kudo
Its configurable, run pig -x tez and it will use tez. Tez is a processing engine not too unlikely to spark that can run mapreduce tasks but many other tasks as well. For Pig tez executes the mapreduce code that is generated by pig but it handles the shuffle etc. in its own way.
... View more
02-27-2016
10:19 PM
1 Kudo
These questions are always almost impossible to answer. I would add a couple System.outs to your mapper and reducer to see if data goes in or out. You can then see these messages in the Resourcemanager UI ( port 8088 ) -> Click on your task, click through attempt->Mapper and reducer -> then logs.
... View more
02-26-2016
02:58 PM
2 Kudos
Perhaps change the thread number parameter mentioned in the link? Still weird that it doesn't move at all.
... View more
02-26-2016
01:14 PM
1 Kudo
http://stackoverflow.com/questions/25222633/hadoop-balancer-command-warn-messages-threads-quota-is-exceeded Which version of Hadoop are you using? The answer seems pretty complete.
... View more
02-26-2016
10:55 AM
1 Kudo
@Artem Ervits I tried to reproduce it. I used a sample_07 database and CTASed it as TEXT and as ORC table. Get the same error message for both. It sound weird but my guess would be that this syntax has not worked for a long time. He checks if the classname equals to CombinHiveInputFormat and since All classes extend CombineFileInputFormat I am not sure how that could be true anymore. My guess would be that in the good old times CombineFileInputFormat was the actual class being used and now the classes just extend it so the check doesn't work anymore. But just a guess.
... View more
02-26-2016
09:57 AM
2 Kudos
Very weird, someone from the dev team might have a better idea but ORCFileInputFormat actually is implementing CombineHiveInputFormat. CombineHiveInputFormat is the layer between much of Hive and the most commonly used underlying Inputformats. So ORCInputFormat should be a CombineHiveInputFormat and this should not be happening. Update: I think TABLESAMPLE ( PERCENTAGE ) is broken in general I just tried it with two tables one text format one ORC and neither works. Both with the Error message you got. Now its weird because a CombineHiveInputFormat is also a HiveInputFormat ( which he says I have ). public class OrcInputFormat implements .... CombineHiveInputFormat
... View more
02-25-2016
02:19 PM
1 Kudo
No from the hive.xxx.hooks parameters in the hive-site.xml. They define hooks. If atlas is not added there he doesn't even look for the class ( because the hook is never called )
... View more
02-25-2016
11:58 AM
1 Kudo
One quick fix is to remove the Atlas Hook from the hive-site.xml Find all the hook parameters ( below is the post hook ) and remove the Atlas class from this. hive.exec.post.hooks Or do you want to use Atlas? In that case adding the atlas libraries to shared should help. Did you restart oozie after adding the libraries to the shared folder? You can also check in the output of the hive action if he actually put the atlas jars into the execution directory of the action.
... View more
02-24-2016
01:43 PM
2 Kudos
That would be the time when I start writing some python magic parsing the timestamp from the hadoop -ls output command. Or to be faster a small Java program doing the same with the FileSystem API. Someone already did the first approach with shell script apparently. Replace the echo with a hadoop fs -rm -r -f and you might be good. But I didn't test it obviously ... http://stackoverflow.com/questions/12613848/finding-directories-older-than-n-days-in-hdfs
... View more