About bleonhardi

bleonhardi · ‎07-15-2016

There is a control in ambari under HDFS right at the top for the memory of the Namenode. You should not set the heap size for all components since most need much less memory than the namenode. For Namenode a good rule of thumb is 1GB for 100TB of data in HDFS ( plus a couple GB base so 4-8 min ) but it needs to be tuned based on workload ( if you suspect your memory settings to be insufficient you can look at the memory and GC behaviour of your JVM )

bleonhardi · ‎07-15-2016

Ambari applies some heuristics. But it can never hurt to double check. Datanode: 1GB works but for normal nodes ( 16 cores/12-14 drives/128-256GB RAM) I normally set it to 4GB Namenode: Depends on HDFS size. A good rule of thumb is that 100TB of data in HDFS need 1GB of RAM. Spark: This totally depends on your spark needs. ( not talking about history server but the defaults for executors ) The more power you need the more executors and more RAM in them ( up to 32GB is good apparently ) Yarn: Ambari does decent heuristics but I like to tune them normally. In the end you should change the sizes until yoiur cluster has a good CPU utilization. But this is something more elaborate and depends on your usecases

bleonhardi · ‎07-15-2016

@mike harding hmmm I think your cluster has bigger problems. It shows that only 2 of the 700 tasks are actually running. How many slots do you have in your cluster? I.e. yarn min/max settings and total memory in cluster? They still shouldn't take so long anymore because they shoiuld only have 1/700 of data but something is just weird. I think you might have to call support.

bleonhardi · ‎07-13-2016

@mike harding yeah the problem is that Tez tries to create an appropriate number of map tasks. Too much is wasteful since there is a lot of overhead, not enough and tasks are slow. Since your data volumes are tiny it creates one task essentially. However since you use a cartesian join and blow up your 50k rows into 2.5 billion computations this is not appropriate in your case. You need at least as many tasks as you have task slots in your cluster to properly utilize your processing. To increase the number of tasks you have multiple options: a) My Option with the grouping parameters You can tell Tez to use more mappers by changing the grouping parameters. (essentially telling tez how much bytes each map task should process) However this is a bit trial and error since you need to specify a byte size for each mapper and in these small data volumes it gets a bit iffy b) Gopals approach using a reducer This is more elegant, instead of doing the join in mappers you add a shuffle before and set the number of reducers to the number you want ( gopal used 4096 but your cluster is small so I would use less ). You then add a sub select inside with a sort by essentially forcing a reduce step before your computation. In the end you will have Map ( 1 ) -> Shuffle -> Reducer ( Sort by 4096 values 0 ) -> your spatial computations. -> Reduce (group by aggregate) Makes sense?

bleonhardi · ‎07-13-2016

My first question would be why do you want to do that. If you want to manage your cluster you would normally install something like pssh or ansible or puppet and use that to manage the cluster. You can put that on one control node define a list of servers and move data/execute scripts on all of them at the same time. You can do something very simple like that with a one line ssh program To execute a script on all nodes: for i in server1 server2;do echo $i; ssh $i $1;done To copy files to all nodes: for i in server1 server2;do scp $1 $i:$2;done [all need keyless ssh from the control node to the cluster nodes] If on the other hand you want to execute job dependencies, something like the distributed mapreduce cache is normally a good idea. Oozie provides the <file> tag to upload files from hdfs to the execution directory of the job. So honestly if you go into more details what you ACTUALLY want we might be able to help more.

bleonhardi · ‎07-13-2016

In general the normal client is faster unless your client server is heavily CPU constrained. The PQS essentially is a thrift proxy for the thick client so you have additional effort and latency. ( Esp. if you have larger result sets which are read in parallel from all regions by the thick client but sucked through a single pipe from the PQS ) If you can connect the clients directly to the Region Servers and the clients are not heavily CPU constrained I would go with the normal client.

bleonhardi · ‎07-12-2016

Good to have the higher power doublecheck 😉 . The sort by is a nice trick will use that from now on. Didn't actually see the sum, I thought he returned a list of close coordinates.

bleonhardi · ‎07-12-2016

"from a, b" Unless I am mistaken ( I didn't look at it in too much detail ) You are doing a classic mistake. A cartesian join. There is no join condition in your join. So you are creating: 45,000 * 54000 = 2000000000 rows. This is supposed to take forever. If you want to make this faster you need to join the two tables by a join condition like a key or at the very least restrict the combinations a bit by joining regions with each other. If you HAVE to compare every row with every row you need to somehow get him to distribute data more. You can decrease the split size for example to make this work on more mappers. Tez has the min and max group size parameters to say how much data there will be in a mapper. You could reduce that to something VERY small to force him to split up the base data set into a couple rows each. But by and large cartesian joins are just mathematically terrible. You seem to be doing some geodetic stuff so perhaps you could generate a set of overlapping regions in your data set and make sure that you only compare values that are in the same regions there are a couple tricks here. Edit: a) If you HAVE to compare all rows with all rows. 2b comparisons are not too much for a cluster. Use the following tez parameters to split up your computations into a lot of small tasks. Per default this is up to a GB but you could force it to 100 kb or so and make sure you get at least 24 mappers or so out of it: tez.grouping.max-size tez.grouping.min-size b) Put a distance function in a join condition where distance < 1000 or so. So you discard all unwanted rows before the shuffle. He still needs to do 2b computations but at least he doesn't need to shuffle 2b rows around. c) if all of that fails you need to become creative. One thing I have seen for distance functions is to add quadratic regions of double your max. distance length to your data set ( overlapping ) assign them to the regions and only join them if the two points at least belong to one of the regions.

bleonhardi · ‎07-11-2016

Hmmm access issue? If its in the root folder only root can access it. Hive error messages can be pretty generic here and not distinguish between access rights and files exists. If you run beeline the read will be executed by the hive server and that one is running under the hive user so only he could access the data. Try it in /tmp if it works there too you would know.

bleonhardi · ‎07-11-2016

I think the majority of people do not use ssh fencing at all. The reason for this is that Namenode HA works fine without it. The only issue can be that during a network partitioning old connections to the old standby might still exist and get stale old date during read-only operations. - They cannot do any write transactions since the Journalnode majority prohibits that - Normally if zkfc works correctly an active namenode will not go into zombie mode, he is dead or not. So the chances of a split brain are low and the impact is pretty limited. If you use ssh fencing the important part is that your script cannot block other wise the failover will be stopped, you need to have all scripts return in a sensible amount of time even if the access is not possible. Fencing by definition is always an attempt. Since most of the time the node is simply down. And they need to return success in the end. So you need a fork with a timeout and then return true. https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Verifying_automatic_failover

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: How to set the NameNode Heap memory in ambari?

Re: Do I need to tune Java heap size

Re: Hive query running on Tez contains a Mapper th...

Re: Hive query running on Tez contains a Mapper th...

Re: Copy script files from NFS to each node in the...

Re: Which gives better performance for both writes...

Re: Hive query running on Tez contains a Mapper th...

Re: Hive query running on Tez contains a Mapper th...

Re: No files matching path error while loading dat...

Re: SSH Fence not working?