Member since
10-01-2020
2
Posts
0
Kudos Received
0
Solutions
10-01-2020
04:35 AM
My best practice:- Keep the number of executors equal to the number of spark clients that your cluster is configured for. Go with 2 cores per executor. You will fetch optimum performance with consistent distribution of batches for processing as confirmed from Spark UI.
... View more
10-01-2020
04:29 AM
Hbase stores data as a sorted map by keys. HBase is considered a persistent, multidimensional, sorted map, where each cell is indexed by a row key and column key (family and qualifier). A rowkey, which is immutable and uniquely defines a row, usually spans multiple HFiles. Rowkeys are treated as byte arrays (byte[]) and are stored in a sorted order in the multi-dimensional sorted map. If you look for a row_key, Hbase is able to identify the node where this data is present. Hadoop runs its computation on the same node where the key is present and hence the performance with technologies like Spark is really good. This is called data localization.
... View more