About nmaillard1

nmaillard1 · ‎02-27-2018

Adding columns to the end of the table works from Hive 1.2 (via HDP 2.5.4). In Hive 2.1, you get additional abilities to change the column types. In the eventual Hive 2.2, you'll get the ability to delete and reorder columns. Hive .13 is a little early for those features

nmaillard1 · ‎02-27-2018

Hello Mithun Having a merge step is definitely more full proof approach. Otherwise you will to know more of your data and distribution and set yourself. A first step would be hive.merge.smallfiles.avgsize that would add the extra step only of the average is not respected. You can also set the number of reducers yourself either statically or dynamically based on the volume of data coming in and if you know your workload this will allow you to calculate the file output size roughly. Seems like a trade off between a more generic approach with a merge step and a more granular approach in which you know your workload. hope this helps

nmaillard1 · ‎04-23-2017

Hello Sami Without the logs and or error message this is left up to guess. A quick look seems to show that your are allocating more than the total amount of ressources and that you do not seem to have defined the support queue you listed.

nmaillard1 · ‎12-20-2016

This article will go over the concepts of security in an Hbase cluster. More specifically we will concentrate on ACL based security and how to apply at the different levels of granularity on an Hbase model. From an overall security perspective and acces control list , or ACL, is a list of permissions associated with an object ACLs focus on the access rules pattern. ACL logic Hbase access contol lists are granted on different levels of data abstractions and cover types of operations. Hbase data layout Before we go further let us clear out the hierarchical elements that compose the datastorage Hbase CELL : All values written to Hbase are stored in a what is know as a CELL. (Cell can also be refered to as KeyValue). Cells are identified by a multidimensionnal key {row, column, qualifier, timestamp}. In the example above : CELL => Rowkey1,CF1,Q11,TS1 COLUMN FAMILY : A column Family groups together arbitrary cells. TABLE : All Cells belong to a Column family and are organized into a table. NAMESPACE : Tables in turn belong to Namespaces. This can thought of as a database to table logic. With this in mind a table’s fully qualiefied name is Table => Namespace :Table (the default namespace can be omitted) Hbase scopes Permissions are evaluated starting a the widest scope working to the narrowest scope. Global Namespace Table Column Family (CF) Column Qualifier (Q) Cell For example, a permission granted at a tabe dominates grants done at the column family level. Permissions Hbase can give granular access rights depending on each scope. Permissions are either zero or more letters from the set RWXCA. Superuser : a special user that has unlimited access Read(R) : Read right on the given scope Write(W) : Write right on the given scope Execute(X) : Coprocessor execution on the given scope Create(C) : Can create and delete tables on the given scope Admin(A) : Right to perform cluster admin operations, fro example granting rights Combining access rights and scopes creates a complete matrix of access patterns and roles. In order to avoid complex conflicting rules it can often be useful to build access patterns from roles and reponsibilities up. Role Responsibilites Superuser Usually this role should be reserved solely to the Hbase user Admin (A) Operationnal role : Performs cluster-wide operations like balancing, assigning regions (C) DBA type role, creates and drops tables and namespaces Namespace Admin (A) : Manages a specific namespaces from an operations perspective can take snapshots and splits etc.. (C) From a DBA perspective can create tables and give access Table Admin (A) Operationnal role can manage splits,compactions .. (C) can create snpashots, restore a table etc.. Power User (RWX) can use the table by writing or reading data and possibly use coprocessors. Consumer (R) User can only read and consume data Some actions need a mix of these permissions to be performed CheckAndPut / CheckAndDelete : thee actions need RW permissions Increment/Append : only require W permissions A full complete list of the acl matrix can be found here : http://hbase.apache.org/book.html#appendix_acl_matrix Setting up In order to setup Hbase ACLs you will need to modify the Hbase-site.xml with the following properties <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.token.TokenProvider</value> </property> <property> <name>hbase.coprocessor.master.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController</value> </property> <property> <name>hbase.coprocessor.regionserver.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController</value> </property> <property> <name>hbase.security.exec.permission.checks</name> <value>true</value> </property> In Ambari this is much easier just enable security and Ambari will automatically set all these configurations for you. Applying ACLs Now that we have restarted our Hbase cluster and set up the ACL feature we can start setting up rules. For simplitcity purposes we will use 2 users : Hbase and testuser. Hbase is the superuser for our cluster and will let us set the rights accordingly. Namespace As the Hbase use we create an ‘acl’ namespace hbase(main):001:10> create_namespace ‘acl’ 0 row(s)in 0.3180 seconds As testuser we will create a table in this new namespace hbase(main):001:0> create 'atest','cf'ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=testuser, scope=default, params=[namespace=default,table=default:atest,family=cf],action=CREATE) We are not allowed to create a tabe in this namespace. Super user Hbase will give the rights to testuser. hbase(main):001:10> grant 'testuser','C','@acl'0 row(s) in 0.3360 seconds We can now run the previous command as the testuser hbase(main):002:0> create 'atest','cf'0 row(s) in 2.3360 seconds We will now open this table to another user testuser2 hbase(main):002:0> grant 'testuser2','R','@acl'ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=testuser, scope=acl, params=[namespace=acl],action=ADMIN) Notice we can’t grant rights to other users as we are missing Admin permissions We can fix this with our Hbase super user hbase(main):002:20> grant 'testuser','A','@acl'0 row(s) in 0.460 seconds

nmaillard1 · ‎12-12-2016

Hello You can definitely upload data in hdfs and then in Hbase through Hive. You can also query Hbase through Hive using the hbase storagehandler. Please refer here for more detailed explanation: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration If this is derived from a Hive table it has a schema so I would also consider the Hive / Phoenix storage handler:https://phoenix.apache.org/hive_storage_handler.html On a performance standpoint overall querying Hbase through Hive should be less performant then querying ORC tables. This beeing said it depends on the query pattern and what the use case is. regards

nmaillard1 · ‎10-09-2016

Hello This thread might help : https://community.hortonworks.com/questions/24961/how-to-configure-ha-for-knox-gateway-using-any-loa.html and the knox documentation as well: http://knox.apache.org/books/knox-0-6-0/user-guide.html#High+Availability As far as ambari is concerned there are plans,but you can always create your own Ambari stack to deploy a second knox and do the work to make it HA.

nmaillard1 · ‎10-09-2016

Hello Pan This question is about node ressources and data per region. Not reallu sure what your other configuratiosn like handlers or GC or cache or region replicas are so a little in the dark. The usual formula is (RS memory)*(total memstore fraction)/((memstore size)*(# column families)) This calculation is really about guidelines not a hard truth because it will also depend of actual load and query pattern.Your Regionserver can very well hold much more regionservers but by definition get much more writes since it has the responsibility of more regions. As such it will buffer and flush very often, under heavy load you are prone to having big flush,compaction issues and probably eventually region servers going down because non responsive. Again if out of the 2000 region servers only a couple are actually active it is not as critical, still not a good pattern. Same on the read side if you look at the amount of memory allocated for the cache with that many regions if they are often used you will end up going to disk very often and result in poor read performance. you could look at your hit miss ratio to see how your regions servers go down. Lastly with that kind of distribution if one region server goes down your overall loss is probably very big so not ideal for recovery purposes. Overall 100-200 Regions per RS seems a decent high ball park, depending on ressources too much outside will need some tuning and monitoring. Hope this sheds some light

nmaillard1 · ‎10-03-2016

Hello Rahul Your question is a little generic so hard to help you out much without things like the service used, the data read etc... This being said since we are in the yarn thread I suppose it is a yarn service like hive or spark. In your shoed I would go to to the yarn UI and job logs to understand where the latency happens: Is it in init phase is yarn waiting to get the containers in which case ressources or max am per queue are possible configurations to look at. Is it in the compute phase itself do you have "mappers" that are much longer in which case you need to look at things like container errors and restart or IO throughput, or data spill. The Tez UI has a very good tool Tez swimlane to get a high level view of the dag and get a sense of where to look. Same thing on the Spark side with the Spark UI. Hope any of this helps

nmaillard1 · ‎09-26-2016

When a client wants to write an HDFS file, it must obtain a lease, which is essentially a lock, to ensure the single-writer semantics. If a lease is not explicitly renewed or the client holding it dies, then it will expire. When this happens, HDFS will close the file and release the lease on behalf of the client. The lease manager maintains a soft limit (1 minute) and hard limit (1 hour) for the expiration time. If you wait the lease will be released and the append will work. This being a work around the question is how did this situation come to be. Did a first process break? do you have storage quota enabled and writing on a maxed out directory?

nmaillard1 · ‎09-26-2016

@Sunile Manjee I have never seen vs stats on these two bulk loading calls. If you have a phoenix table it would require a little bit of work to get a native Hbase schema to really look enough like a phoenix table for this comparaison to mean anything. Things like complex keys or column types come to mind. If it is just a phoenix view on an hbase table then comparaison might make more sense but you loose a lot of phoenix magic. Overall the performance should not variate much from one to the other aside from any extra work you hide in the Phoenix table, like index,stats... From a pure operations perspective use the bulkload best fitted to the type of your table

Online	Offline
Last Visited	‎10-17-2018 10:48 AM

Member Since	‎09-17-2015 07:33 PM
Last Visited	‎10-17-2018 10:48 AM
Posts	70
Kudos received	79

Cloudera Community

Re: To what extent is schema evolution available i...

Re: CONTROL SIZE OUTPUT FILE SIZE WITHOUT ADDING M...

Re: how to compute regionserver's normal region co...

Re: What I need to check if Job taking 15-20 min m...

Re: Integration between Apache Pig, Apache Nifi an...

Re: To what extent is schema evolution available i...

Re: CONTROL SIZE OUTPUT FILE SIZE WITHOUT ADDING M...

Re: yarn scheduler question

Hbase security model part1

Re: Hive and Hbase table

Re: Knox HA in Ambari

Re: how to compute regionserver's normal region co...

Re: What I need to check if Job taking 15-20 min m...

Re: appendToFile: Failed to APPEND_FILE/hdfs/locat...

Re: Performance metrics phoenix bulk load vs hbase...