About bleonhardi

bleonhardi · ‎02-29-2016

It is almost impossible to get a steadily increasing not interrupted sequence number in a parallel data warehouse like Hive. There is the following UDF but you would have to restrict the number of mappers to 1 to make this work. https://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java There are some tricks like Accumulators in Spark for example that can do similar things ( give unique numbers at least if not steadily increasing ) but not in Hive as far as I know.

bleonhardi · ‎02-28-2016

Its configurable, run pig -x tez and it will use tez. Tez is a processing engine not too unlikely to spark that can run mapreduce tasks but many other tasks as well. For Pig tez executes the mapreduce code that is generated by pig but it handles the shuffle etc. in its own way.

bleonhardi · ‎02-27-2016

These questions are always almost impossible to answer. I would add a couple System.outs to your mapper and reducer to see if data goes in or out. You can then see these messages in the Resourcemanager UI ( port 8088 ) -> Click on your task, click through attempt->Mapper and reducer -> then logs.

bleonhardi · ‎02-26-2016

Perhaps change the thread number parameter mentioned in the link? Still weird that it doesn't move at all.

bleonhardi · ‎02-26-2016

http://stackoverflow.com/questions/25222633/hadoop-balancer-command-warn-messages-threads-quota-is-exceeded Which version of Hadoop are you using? The answer seems pretty complete.

bleonhardi · ‎02-26-2016

@Artem Ervits I tried to reproduce it. I used a sample_07 database and CTASed it as TEXT and as ORC table. Get the same error message for both. It sound weird but my guess would be that this syntax has not worked for a long time. He checks if the classname equals to CombinHiveInputFormat and since All classes extend CombineFileInputFormat I am not sure how that could be true anymore. My guess would be that in the good old times CombineFileInputFormat was the actual class being used and now the classes just extend it so the check doesn't work anymore. But just a guess.

bleonhardi · ‎02-26-2016

Very weird, someone from the dev team might have a better idea but ORCFileInputFormat actually is implementing CombineHiveInputFormat. CombineHiveInputFormat is the layer between much of Hive and the most commonly used underlying Inputformats. So ORCInputFormat should be a CombineHiveInputFormat and this should not be happening. Update: I think TABLESAMPLE ( PERCENTAGE ) is broken in general I just tried it with two tables one text format one ORC and neither works. Both with the Error message you got. Now its weird because a CombineHiveInputFormat is also a HiveInputFormat ( which he says I have ). public class OrcInputFormat implements .... CombineHiveInputFormat

bleonhardi · ‎02-25-2016

No from the hive.xxx.hooks parameters in the hive-site.xml. They define hooks. If atlas is not added there he doesn't even look for the class ( because the hook is never called )

bleonhardi · ‎02-25-2016

One quick fix is to remove the Atlas Hook from the hive-site.xml Find all the hook parameters ( below is the post hook ) and remove the Atlas class from this. hive.exec.post.hooks Or do you want to use Atlas? In that case adding the atlas libraries to shared should help. Did you restart oozie after adding the libraries to the shared folder? You can also check in the output of the hive action if he actually put the atlas jars into the execution directory of the action.

bleonhardi · ‎02-24-2016

That would be the time when I start writing some python magic parsing the timestamp from the hadoop -ls output command. Or to be faster a small Java program doing the same with the FileSystem API. Someone already did the first approach with shell script apparently. Replace the echo with a hadoop fs -rm -r -f and you might be good. But I didn't test it obviously ... http://stackoverflow.com/questions/12613848/finding-directories-older-than-n-days-in-hdfs

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: Sequence number generation in Hive

Re: Does pig uses mapreduce in backend in tezmode?

Re: MapReduce: 0 records written from Reducer

Re: Help with exception from HDFS balancer

Re: Help with exception from HDFS balancer

Re: Is CombineHiveInputFormat deprecated by OrcInp...

Re: Is CombineHiveInputFormat deprecated by OrcInp...

Re: hive.exec.post.hooks Class not found:org.apach...

Re: hive.exec.post.hooks Class not found:org.apach...

Re: Do we have any script which we can use to clea...