About aromero

18896724276 · ‎12-07-2016

@Alberto Romero Useful,thank you ! Ambari 2.* enable kerberos, and can't rmr /ams-hbase-secure ,solve my problem !

aromero · ‎04-23-2016

Introduction HBase replication provides with a way of replicating HBase data from one cluster to another by adding the remote Zookeeper quorum as remote peer. Configuration on the cluster First of all, it is necessary to set thehbase.replication property to true. Then add the remote peer through hbase shell. The peer id can be any short name: For example: add_peer '1', "hdpdstzk01.machine.domain, hdpdstzk02.machine.domain, hdpdstzk03.machine.domain:2181:/hbase-secure" (If using Kerberos then the right JAAS configuration needs to be used, or it would be required to have the hbase service keytab in the cache to authenticate correctly against Zookeeper through SASL). Configuration on the tables Replication is set at table and column family level by setting the propertyREPLICATION_SCOPE to ‘1’. The default value that tables get created with if not specified is ‘0’, which means no replication. If applying on already existing tables, then they need to be disabled, then the property added through alter, and then re-enabled back. For example: alter "product:user", {NAME => 'document', REPLICATION_SCOPE => '1'} Copying existing data across If there is already data on the source table, it can be replicated initially through the CopyTable command: bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable –peer.adr=hdpdstzk01.machine.domain, hdpdstzk02.machine.domain, hdpdstzk03.machine.domain:2181:/hbase-secure mytable [--new.name=mytableCopy] [--starttime=abc --endtime=xyz] new.name is only used when the destination table name is different from the source one starttime and endtime can be used when we want to replicate a specific interval of HBase timestamps

lgeorge · ‎05-17-2016

For additional information, see recent additions to the Kafka Guide. Here's the link for HDP 2.4.2: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_kafka-user-guide/content/ch_kafka_mirrormaker.html

aromero · ‎03-22-2016

Introduction Security best practices when using Ranger dictate that Hive jobs should ideally run as user 'hive' so that only Ranger Hive policies apply for end user access to data, and letting 'hive' own all the directory/file structure for Hive on HDFS. This is achieved by using hive.server2.enable.doAs set to 'false'. It also allows to improve performance as it enables container pre-warming for Tez, as it is only applicable for those jobs started by 'hive', and not by other end users. Problem The problem introduced by doAs = false is that, if YARN Capacity Scheduler queue mappings have been defined on a user/group basis, the mappings will not apply since all the jobs will be started as the same user (i.e. 'hive'), making the queue definitions completely useless. Solution One solution could be to use a Hive hook that could detect the real user that started the query so that we could submit the job to the right queue even if it still runs as user 'hive'. Then, the hook could find the list of groups the user belongs to and try to match them with a group-mappings file (with the structure groupname:queuename). When it finds one of the user groups it will automatically submit the job to the matched queue. The Hive hook can be found in: https://github.com/beto983/Hive-Utils This Hive hook is able to detect the user that started the hive session, find the groups that it belongs to, and send the job to the corresponding queue depending on that group and the mappings we define on the group-mappings file. It is based on this other hook which will submit the job to a queue named as the primary user's group: https://github.com/gbraccialli/HiveUtils Steps to follow: On all HiveServer2 servers do: mkdir /usr/hdp/current/hive-client/auxlib/ && wget https://github.com/beto983/Hive-Utils/blob/master/Hive-Utils-1.0-jar-with-dependencies.jar -O /usr/hdp/current/hive-client/auxlib/Hive-Utils-1.0-jar-with-dependencies.jar Add the following setting on hive-site.xml (Custom hiveserver2-site on Ambari): hive.semantic.analyzer.hook=com.github.beto983.hive.hooks.YARNQueueHook Create a "group-mappings" file in /etc/hive/conf/ with the structure: groupname:queuename groupname:queuename groupname:queuename ... Restart Hive

Online	Offline
Last Visited	‎05-09-2019 08:29 AM

Member Since	‎01-09-2016 10:57 AM
Last Visited	‎05-09-2019 08:29 AM
Posts	11
Kudos received	33

Cloudera Community

Re: How to determine who ran query if Hive Imperso...

Re: Zookeeper - Super User Authentication and Auth...

Configuring HBase Replication

Re: Kafka MirrorMaker

Map Hive jobs to YARN queues