Member since
11-09-2016
68
Posts
16
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2556 | 12-07-2017 06:32 PM | |
965 | 12-07-2017 06:29 PM | |
1585 | 12-01-2017 11:56 AM | |
9680 | 02-10-2017 08:55 AM | |
3099 | 01-23-2017 09:44 PM |
03-02-2017
11:40 AM
Hi @Ashok Kumar BM How many rows do you have in your table ? you can add more Heap to your region servers & Hbase Master & increase your Blockcache but with 100Mb per record ( and that many regions ) and only two region servers it looks like you will be soon hitting the limlits of Hbase fetching "columns" You can unpivot the table to use the power of Hbase fetching vertically using the row key ...
... View more
02-24-2017
01:55 PM
3 Kudos
Authentification
1- Kerberos : Kerberos is mandatory for prod environments, you can either use your AD embeded kerberos or install a new dedicated KDC - Kerberos must be in HA Risk not doing the above : User impersonation for the services accounts ( jobs can be exported to run as super user permission ) 2 - Use a firewall to block all inbound traffic to the cluster- all sources / all ports except from the edge node ( Gateway ) Risk not doing the above : Passwords in the wrong hands will systematically give access to the cluster 3- Check the permissions of the keytabs, detailed this article here : Script to fix permissions and ownership of hadoop keytabs Risk not doing the above: utilisation by other cluster users
4- Use Knox for all API calls to the cluster. Benefits : inbound from "trusted" known machines, that requires authentification against an existing LDAP. Network
1 - The cluster must be in an isolated subnet - no interference with other networks, for security and thouroughput Risk not doing the above: Data interception by/from other machines in the data center. 2- Cluster machines can be linked internally on "non-routed" mode, and the config of the hosts via /etc/hosts in all machines. 3 - Flat Network is not recommended. Risk not doing the above : File inclusion attacks from other machines in the data center. 4- Possibilty of having two DNS resolutions ( internal and external ) is acceptable if the DNS server is HA Although you can combine /etc/hosts with DNS config.
5- IPtables must be disabled within the cluster This is pre-requisite for the installation
6 - /etc/hosts must be configured with the FQDN. Ambari server needs the resolution of all nodes in the cluster in its /etc/hosts This is pre-requisite for the installation Authorizations
1-Give systematically 000 permissions to HDFS files and folders of the data lake ( /data ) , only Ranger controls the access via policies Risk not doing the above: Users can access through ACLs and ignore Ranger policies 2 - You can use Umask : fs.permissions.umask-mode = 0022 Risk not doing the above: Wrong permissions, may lead to ranger policies being ignored. Other Best practices : Do not share the password of super users ( hdfs, hive, spark ... etc ) with all teams, only root should own it. You can disable connection ssh for some super users ( Knox, Spark ... etc ) Please feel free to comment for enhancements ..
... View more
Labels:
02-22-2017
12:15 PM
1 Kudo
Great post @Scott Shaw The partial backup/restore of Hive metastore can be a good new feature for Hive. In general Hive Metastore is shared by many projects/teams , and at the moment we are unable to restore the metadata for one team ( only few tables ) without impacting the others.
... View more
02-10-2017
08:55 AM
1 Kudo
I would suggest you to delete the tHiveConnection and the tHiveClose, copy them from another job which is working, then give it a try , chck also your context values used by these 2 components. also can you put a screenshot of your Job ? Let me know
... View more
02-09-2017
08:40 AM
which user you are using to run the query ? only the allowed users can do the select * because it includes the the column in the policy.
... View more
02-09-2017
08:38 AM
you can delete by using the policy ID check here : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_Security_Guide/content/ranger_rest_api_delete_policy.html
... View more
02-09-2017
08:33 AM
FYI. "Multiple Forest" is supported - but not "Cross Forest" AD. If you have "Cross Forest" AD, Ranger may able to get users from the right branch but not groups or vice versa
... View more
01-25-2017
10:24 AM
1 Kudo
bug (HIVE-15355) is being worked on by Hive engineering team at Hortonworks You can use the following workaround : 1- use "SORT BY 0" at the end of the query which will force to use one reducer , please use this only if you have a small query. 2- try to use set hive.mv.files.thread=0; before running the query. If you have any question regarding the above, please let me know.
... View more
Labels:
01-25-2017
10:22 AM
bug (HIVE-15355) is being worked on by Hive engineering team at Hoortonworks You can use the following workaround : 1- use "SORT BY 0" at the end of the query which will force to use one reducer , please use this only if you have a small query. 2- try to use set hive.mv.files.thread=0; before running the query. If you have any question regarding the above, please let me know.
... View more
Labels:
01-23-2017
09:44 PM
2 Kudos
Confirm that mysql-connector-java.jar is in the Java share directory. ls /usr/share/java/mysql-connector-java.jar #under /usr/hdp/current/hive-server2/lib/ run the commands cp mysql-connector-java-5.1.38.jar mysql-connector-java.jar ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/hdp/current/hive-server2/lib/mysql-connector-java.jar #then restart all services oh Hive
... View more
- « Previous
- Next »