About bleonhardi

bleonhardi · ‎04-26-2016

The main options are ( I know it has been answered before but its a bit hard finding things ) a) Wandisco This essentially duplicates HDFS across 2 clusters by hooking into the Namenode and mirroring any command and block change to the DR cluster. It is the only solution here that actually allows for a transactionally duplicate HR system i.e. no data loss for two clusters for all committed transactions + Immediate replication for HDFS + Only transactionally safe approach - Additional cost - Additional servers - may have some compatibility issues for HDFS - doesn't really fix anything above HDFS ( hive tables, oozie jobs, .... ) for example for hive the data on its own without the metadata of the Metastore doesn't give you a working system on its own b) DistCP in Oozie Relatively manual approach but Distcp provides some nice features to mirror folders. You can either do it manually ( Hadoop often is build on batch jobs with timestamped folders so you can simply add another action for distcp after the ingestion. It also provides a deltaload for a folder ( going by file size so there is a small danger of data loss if you modified a file with the same byte size ). If you want to move a snapshot over you can use HDFS snapshot feature. + Robust and fair enough in case operator knows what he is doing - Kerberos setup can be a bit of a pain ( same REALM helps ) - Relatively manual. - no transactional duplication c) Falcon DR Built on Oozie but does a couple things out of the box that need to be configured in Oozie. For example it can mirror Hive tables ( looks for partitions being added ) It also keeps track of different servers so it has built in concepts that you would need to hard code in Oozie. + Can mirror Hive and HDFS + some nice multi cluster concepts - same downsides as Oozie really d) Duplicate your data ingestion in both clusters ( Have two hot clusters ) This approach is perhaps the safest one since once you have it going both clusters will continue working so in case of a failover you KNOW that your DR cluster will actually be up for the job. + Will cover everything on top of HDFS like Hive and oozie + Will continuously test the DR cluster - you might end up with non identical clusters if your code promotion is faulty - additional unnecessary work ( data transformations in DR cluster e) Kafka Mirror Maker If you have mostly realtime datasources you will normally have a Kafka ingestion layer. You can use MirrorMaker to duplicate Kafka topics across two clusters.

bleonhardi · ‎04-26-2016

We can argue that till the sun goes down but yarn allocates containers as multiples of min allocation size. And he doesn't spread this among containers. Each single container gets 1x 2x 3x of min allocation size. That's just like it is. ( How else do you think can yarn distribute containers across nodes? Each nodemanager has a min and max allocation size i.e. number of slots ) So in this case he allocated 59904. Thats exactly 9 containers. Assuming this is still running and stable my guess would be that 1) the driver has one container 2) you have 4 executors that have two containers each 1 + 4 *2 = 9 = 59904MB. As I said change the settings of executor memory AND memoryOverhead to totally go below 6605 and you will see them use less containers.

bleonhardi · ‎04-26-2016

To be honest no idea. Are you sure that its not just a temporary anomaly ( I.e. that it was 4 containers a second ago and one stopped? ). Resourcemanager UI sometimes does that. It is necessarily 6.5/13/19.5/... If 3 containers have 26GB you either have one container who got 2 slots or in this case much much more likely the Resourcemanager didn't update the total memory consumption after one of the containers stopped.

bleonhardi · ‎04-26-2016

First let me explain in one or two sentences how yarn works then we go through the questions. I think people sometimes get confused by it all. Essentially yarn provides a set of memory "slots" on the cluster. These are defined by "yarn.scheduler.minimum-allocation-mb". So let's assume we have 100GB of yarn memory and 1GB minimum-allocation-mb we have 100 "slots". If we set the minimum allocation to 4GB we have 25 slots. Now this minimum allocation doesn't change application memory settings per se. Each application will get the memory it asks for rounded up to the next slot size. So if the minimum is 4GB and you ask for 4.5GB you will get 8GB. If you want less rounding error make the minimum allocation size smaller. Or better ask for a correct slot size. Applications then ask for memory usually defined as a parameter AND they also set the memory they actually use. Yarn per se doesn't give a damn about what you do in your container UNLESS the task spawned off by the container launcher is bigger than the allocated container size in which case it will shoot down this container. So let's go through this step by step shall we? "I observed during busy hour, all Yarn cluster memory is used up while lots of cores are free--- which leads me to believe that we should decrease minimum container size." What do you mean with cores are free? The Resourcemanager settings? Nornally cgroups are disabled anyway so the best way to figure this out is a top command on some of the datanodes during busy season. If you see CPU utilization below 80% when the cluster is running at full tilt you should reduce container sizes. ( Assuming your tasks don't need more ) "lots of applications specify their memory, -xms=4GB etc; considering container overhead memory, would not it requires container size bigger than 4G?" What applications. Normally you need the Heap of the Java process PLUS around 10-25% overhead ( the JVM has some offheap overhead that still counts into the memory consumption of the linux process ) otherwise YARN will simply shoot down the containers when it sees that the Linux process spawned off by the container launcher is bigger than its memory limit for this container. That is the reason pretty much all things running on yarn have a container size in mb ( for example 4096) and a java command line setting for example -Xmx3600m "If I set yarn.scheduler.minimum-allocation-mb = 4GB," Min allocation by itself doesn't give you anything. That as the name says just provides the slot size for containers. Essentially containers could be 4, 8, 12, 16 ... GB in size if you set this settings. Whatever an application requests. If it requests anything smaller or in between it gets the next big size. So if you request 5GB you get 8GB in your case. "Spark executor size is 4GB as well, would it actually assign 2 containers/8GB to the Spark executor or yarn is smart enough to allocate around 5GB? Two things here : a) If you want yarn to be able to allocate 5GB you just set "yarn.scheduler.minimum-allocation-mb=1024 " In this case yarn could give out 5GB and there would be no other effect on other applications. After all they all have to ask for a specific memory ( you can find the sizes they ask for in the mapreduce2 and hive settings in ambari b) Spark is a bit different here to all the others. Instead of providing yarn size and jvm settings it. has something called memoryOverhead, for example for the executors spark.yarn.executor.memoryOverhead. You essentially should set the executor memory AND the memory overhead so they together fit into one container. Normally the default for the overhead is 384 I think so if you require 4096 mb for the executor spark will ask for 4500mb and get an 8GB container. Which as you have notes is hardly ideal To optimize that you have two options a) lower the yarn minimum container size so yarn CAN round up to 5GB or better b) set spark memory AND memoryoverhead so together they fit exactly into one container slot. Similar to the way tez and mapreduce do it. "yarn.scheduler.minimum-allocation-mb = 4GB, hive.tez.container.size=5B, would not Yarn smart enough to assign 5GB to a container to satisfy tez needs?" As written above no and it has nothing to do with smart. Its just that minimum allocation name is a bit misleading. Its effectively the slot size. So if you want yarn to be able to deal out 5GB containers set the minimum allocation size to 1GB. There is no real downside to this. It does NOT change container sizes for applications ( since they all have their request settings anyway ) . The only downside is that some users in a shared cluster might reduce their container sizes heavily and take more CPU cycles than they should get. But the best way is to simply ask for clean container sizes. In the case of spark by correctly setting the memoryOverhead and memory settings for drivers and executors.

bleonhardi · ‎04-25-2016

Very Curious the partitions are defined by minpartitions or the default splits in your inputformat. For a textfile this should be TextInputFormat. Below is code from HadoopRDD.scala. So I don't think this is Spark doing anything wrongly. Can you try to read the same file in Pig and see how many tasks he creates in this case? Might be some wasb issue? override def getPartitions: Array[Partition] = { val jobConf = getJobConf() // add the credentials here as this can be called before SparkContext initialized SparkHadoopUtil.get.addCredentials(jobConf) val inputFormat = getInputFormat(jobConf) val inputSplits = inputFormat.getSplits(jobConf, minPartitions) val array = new Array[Partition](inputSplits.size) for (i <- 0 until inputSplits.size) { array(i) = new HadoopPartition(id, i, inputSplits(i)) } array }

bleonhardi · ‎04-25-2016

I think this thread gave a good overview here. There are four separate things that need to be done. You need to distinguish Kerberos principals and normal users ( ldap, local ) for service users and application users. https://community.hortonworks.com/questions/26894/hadoop-security-1.html#answer-26922 ""There is an option to use existing AD as KDC. So does this mean it is using AD authentication? I would be very grateful, if some one could help me on this." Basically yes. You need to create a service user OU in AD,then provide ambari with an admin user for that OU Ambari would then create the service user principals extract the keytabs and distribute them in the cluster. Separately you need to configure Linux/AD connection for local users ( SSSD normally ). "Does AD(KDC) has to be present in same machine I am enabling Kerberos ?" I mean you need to be able to access the AD server, you need to have SSSD configuration for Linux and your cluster needs to join the KDC REALM, ( the latter will be done by ambari )

bleonhardi · ‎04-25-2016

That is very weird. After all a minor compaction gets sometimes elevated to a major compaction. It would be pretty catastrophic if this would make HBase inaccessible. It is also never mentioned anywhere. I totally agree that there will be a performance impact of course. http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/ during flushes & compactions, HBase keeps processing put and get requests, always giving a consistent view of the data

bleonhardi · ‎04-25-2016

I would suppose so. Also he is working with the timeline server, communicating with the Resourcemanager and nodemanagers does logging, etc.

bleonhardi · ‎04-25-2016

2. Your only chance is a CTAS. I.e. create a new table "as" the old one compressed as zip then rename them. You can do that with external tables as well. However this is only true of new Hive versions and ORCs/Tez. For Parquet snappy may still be better.

bleonhardi · ‎04-25-2016

The settings are a bit weird. I never had luck getting the JAR_PATH settings correctly to work. What worked for me: I always create an auxlib folder in the hive installation and put all jars I need in there. These are then available to the Server and the executed tez jobs. I.e. as hive mkdir /usr/hdp/<version>/hive/auxlib and then just copy your jars in there. You only need to add these jars to the Hiveserver location ( not all nodes ). If you use the old hive client you also need it on the client machines ( but you should use beeline anyhow )

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: HDFS replication for DR

Re: Yarn container size flexible to satisfy what a...

Re: Yarn container size flexible to satisfy what a...

Re: Yarn container size flexible to satisfy what a...

Re: Spark RDD partitions behavior in HDInsight (Az...

Re: How does kerberos work ?

Re: Any blocking during HBase compaction?

Re: Varying vcores/ram for hive queries running Te...

Re: Hive table format and compression

Re: Ambari hive's "hive.aux.jars.path" configurati...