Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
01-16-2016
06:12 PM
4 Kudos
Question: I am about to initiate the cluster install wizard on a new Ambari install. I reviewed the information on service users at http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_defining_service_users_and_groups_for_a_hd I am wondering whether I should take the "Skip Group Modifications" option. The doc states "Choosing this option is typically required if your
environment manages groups using LDAP and not on the local Linux
machines". In our environment, users and groups are managed via Active
Directory (via Centrify). We are planning to enable security on the cluster after it's
installed, and that will include a host of new users being created,
after which many of the initial users and groups will be orphaned. What does that "Skip group modifications" option actually do? Should it be used in this case? Answer: I believe the answer lies
in the fact that we do a groupmod hadoop statement and there is no group
called hadoop, or this is not allowed in your environment. Since
you will be integrating with LDAP or AD you should use the "Skip Group
Modifications". Upon installing of your Linux nodes references groupds
from LDAP, the groupmod hadoop statement would fail. See http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/_customize_services.html "Service Account Users and Groups The service account users and groups are available under the Misc tab. These are the operating system accounts the service components will run as.
If these users do not exist on your hosts, Ambari will automatically
create the users and groups locally on the hosts. If these users already
exist, Ambari will use those accounts. Depending on how your environment is configured, you might not allow groupmod or usermod operations. If this is the case, you must be sure all users and groups are already created and be sure to select the "Skip group modifications" option on the Misc tab. This tells Ambari to not modify group membership for the service users." Also in http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_troubleshooting/content/_resolving_cluster_install_and_configuration_problems.html "3.7. Problem: Cluster Install Fails with Groupmod Error The
cluster fails to install with an error related to running groupmod.
This can occur in environments where groups are managed in LDAP, and not
on local Linux machines. You may see an error message similar to the
following one: Fail: Execution of 'groupmod hadoop' returned 10. groupmod: group 'hadoop' does not exist in /etc/group 3.7.1. Solution When
installing the cluster using the Cluster Installer Wizard, at
the Customize Services step, select the Misc tab and choose the Skip
group modifications during install option."
... View more
Labels:
01-05-2016
08:21 PM
3 Kudos
Tutorial Link SandboxVersion: HDP 2.3.2
AmbariVersion2.1.2
Hadoop stack version:Hadoop2.7.1.2.3.2.0-2950 ------------------------------------------- Issue 1: No available when executing command yum groupinstall “Development tools“ Resolution: This occurs when you copy and pasted the command. The double quotes used are wrong. Use " instead. There run the following command with the right double quotes: yum groupinstall "Development tools" The same occurs with pip install “ipython[notebook]“ Instead run pip install "ipython[notebook]" --------------------------------------- Issue 2: No ~/.ipython/profile_pyspark found after executing command ipython profile create pyspark Resolution: IPython was updated to 4.0.0 which uses jupyter Run jupyter notebook --generate-config Then nano /root/.jupyter/jupyter_notebook_config.py ------------------------------------------- Issue 3: - profile Error when executing "~/start_ipython_notebook.sh" Resolution: Use IPYTHON_OPTS="notebook" pyspark instead of IPYTHON_OPTS="notebook –profile pyspark" pyspark
... View more
Labels:
01-04-2016
04:55 PM
5 Kudos
Tutorial Link SandboxVersion: HDP 2.3.2
AmbariVersion2.1.2
Hadoop stack version:Hadoop2.7.1.2.3.2.0-2950 Issue 1: Error initializing SparkContext when executing spark-shell command When you issue the command as root spark-shell --master yarn-client --driver-memory 512m --executor-memory 512m You would would receive the error: ERROR SparkContext: Error initializing SparkContext.org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/.sparkStaging/application_1451921066894_0001":hdfs:hdfs:drwxr-xr-x
Resolution sudo su - hdfs
hdfs dfs -mkdir /user/root
hdfs dfs -chown root:hdfs /user/root
exit -------------------------------------------------------- Issue 2
... View more
Labels:
01-04-2016
04:15 PM
Tutorial Link Sandbox Version: HDP 2.3.2
Ambari Version 2.1.2
Hadoop stack version: Hadoop 2.7.1.2.3.2.0-2950 Issue 1: Permission Denied Copy the data over to HDFS on Sandbox with the following command will result in a Permission denied error. hadoop fs -put ~/Hortonworks /user/guest/Hortonworks Resolution: See Hands-on Spark Tutorial: Permission Denied ------------------------------
... View more
12-08-2015
12:11 AM
4 Kudos
Connections among nodes in Hadoop cluster should not be restricted. Many ports used within the cluster by various components are dynamic and are not even known until install occurs. If you want to set firewall rules for external access to the cluster, and you want to know all the ports to restrict see the following: For HDP 2.3: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HDP_Reference_Guide/content/reference_chap2.html For Ambari 2.1.2: http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_ambari_reference_guide/content/ch_configuring_network_port_numbers.html For HDP 2.2: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.9/bk_HDP_Reference_Guide/content/reference_chap2.html For Ambari 2.1: http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.1.0/bk_ambari_reference_guide/content/_default_network_port_numbers_-_ambari.html If you employ Kerberos which is a must for truly secure clusters for authentication, kerberos already identifies users, services and machines. I found this blog, to be informative for iptables for Hadoop clusters: http://jason4zhu.blogspot.com/2014/11/configure-firewall-iptables-for-hadoop-cluster.html
... View more
11-24-2015
10:01 PM
12 Kudos
No Valid Credentials Provided Error Upon Enabling Kerberos in Ambari, some components started (Name Node), but other components that are are failing, such as MapReduce and Hive. Here is an example of the error output when we try to start these services. Fail: Execution of 'hadoop fs -mkdir `rpm -q hadoop | grep -q "hadoop-1" || echo "-p"` /app-logs /mapred /mapred/system /mr-history/tmp /mr-history/done && hadoop fs -chmod -R 777 /app-logs && hadoop fs -chmod 777 /mr-history/tmp && hadoop fs -chmod 1777 /mr-history/done && hadoop fs -chown mapred /mapred && hadoop fs -chown hdfs /mapred/system && hadoop fs -chownyarn:hadoop/app-logs && hadoop fs -chownmapred:hadoop/mr-history/tmp /mr-history/done' returned 1. mesg: ttyname: Invalid argument
15/04/28 16:12:33 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Resolution
Can you do a kinit using hdfs service principal e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM After the kinit, do a klist and ensure that the expired and renewed date are not the same as the ticketed date. If the Renew date is in the past or the same as the Ticketed date execute a kinit -R. Try a hadoop fs -ls command. If successful, try to Restart services in Ambari If you services do not restart continue below. Find where your hadoop-env.sh file is located, usually is cd /etc/hadoop/conf.empty
find / “name=hadoop-env.sh" Edit hadoop-env.sh ( vi hadoop-env.sh). Add debug param sun.security.krb5.debug=true to HADOOP_OPTS variable, that is,
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true ${HADOOP_OPTS}” Try a kinit as hdfs, then hadoop fs -ls command. Look at the debug statements produced. If it says Keytype =18, then the error is due to wrong JCE policy files. This can happen when you have AES256 encryption enabled an you recently upgraded java. Upgrading java will overwrite the JCE policy files which include support for AES256 encryption. To fix this simply re-install your JCE policy jars back into "/usr/java/default/jre/lib/security/“ or the JAVA_HOME in your hadoop-env.sh file on each node. Get the right JCE files: For JDK 8 use http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html. For JDK 7 use http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html Note:
Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets. ---------------------------------------------------------------- Unknown Password or Unable to Obtain Password for User Error Upon Restarting Hadoop Services Assuming you can kinit successfully, and can see the ticket cached in the klist, this error occurs for one of several reasons:
IP address in /etc/hosts and IP address for hostname are different The kerberos prinicipal setting in hdfs-site.xml is wrong. Verify dfs.namenode.kerberos and dfs.datanode.kerberos properties. Wrong file ownerships and/or permissions on /etc/security/keytabs directory. The problem is that the keytabs were created and owned by local hdfs, hbase, and ambari-qa owners. However these uids are different from the uids of the corresponding Active Directory Users. The files need to be owned by the Active Directories UIDs. Resolution
Archive and Clear out all logs. For Teradata these are in /var/opt/teradata/log/hadoop/hdfs and /var/opt/teradata/log/hbase. Normally logs are located in /var/log/hadoop-hdfs. The reason being is that the logs would have been created using the local UIDs which would create a problem. Perform a ls -l on the /etc/security/keytabs directory. Make note of which keytabs are owned by hdfs, hbase, and ambari-qa Then perform ls -n on /etc/security/keytabs. Make note of the UIDs for hdfs, hbase and ambari-qa. Take a look at the /etc/passwd file also and note the UIDs for hdfs, hbase, and ambari-qa. Next perform a touch on a test file. Name it testuid. Perform a chown hdfs testuid. Note the uid. Do a chown for hbase, and ambari0qa. It would be different from the ones found in /etc/security/keytabs. These are the AD UIDs. Go back to /etc/security/keytabs Perform a chown <AD-UID> <keytab>, that is, use the new AD UID found for each of hdfs, hbase, and ambari-qa. Then perform ls -n on /etc/security/keytabs. Make sure the new AD UIDs for hdfs, hbase and ambari-qa are reflected in the keytabs. Ensure that your kinits work for hbase, hdfs and amabri-qa e.g. /usr/share/centrifydc/kerberos/bin/kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@EXAMPLE.COM
Further Resolution
If the services and components do not restart you need to change the permissions on all files owned by hdfs, hbase and ambari-qa on the ENTIRE cluster. Modify and run this script(changing the appropriate uids for hdfs, hbase and ambari-qa). Be careful. It changes many files throughout the cluster. ---------------------------------------------------------------- Second Instances of WebHCat and Oozie fails after Kerberos is enabled Failures occur when two WebHCat servers or two Oozie servers is deployed with Kerberos.... The issue occurs when in Ambari, you use _HOST as the domain name in the WebHCat and Oozie configs for principals since they DO NOT get substituted appropriately when each service starts. An of this example would be using HTTP/_HOST@EXAMPLE.COM or oozie/_HOST@EXAMPLE.COM as principals. Normally this is appropriate since if WebHcat server runs on node 1, this should translate to HTTP/node1.example.com@EXAMPLE.COM or on node 2 to HTTP/node2.example.com@EXAMPLE.COM. Unfortunately this is a bug, as the substitution does not occur. You would need to go directly to the second instance of each server and manually edit the webhcat-site.xml or oozie-site.xml file with the second nodes principals for spengo and oozie respectfully, that is HTTP/node2.example.com@EXAMPLE.COM and oozie/node2.example.com@EXAMPLE.COM Unfortunately if you restart or make any changes in Ambari after that, it would push the wrong configurations to the second instances of each. Since you cannot use _HOST you are forced to use node 1 principals which do not work for node 2. Thus it would overwrite the fixes made to get this resolved on second host. Be mindful of this upon restarts by Ambari. Always save your own versions of webhcat-site.xml and oozie-site.xml. Resolution WebHcat
WebHcat can only have one value for templeton.kerberos.principal in custom webhcat-site.xml Normally you would have the _HOST as the domain name in the principal. WebHcat does not resolve _HOST. In Ambari, set the templeton.kerberos.principal to be HTTP/node1.example.com@EXAMPLE.COM, and restart WebHcat. Log onto node 2 where the second WebHcat server is running and perform the following
su hcat edit webhcat-site.xml located in /etc/hive-webhcat/conf Change all principal names from node 1 to node 2 export HADOOP_HOME=/usr /usr/lib/hive-catablog/sbin/webhcat-server.sh stop /usr/lib/hive-catablog/sbin/webhcat-server.sh start Oozie
Oozie can only have one value for the principals in custom oozie-site.xml for properties oozie.authentication.kerberos.principal and oozie.service.HadoopAccessorService.kerberos.principal HTTP/node1.example.com@EXAMPLE.COM and oozie.service.HadoopAccessorService.kerberos.principaloozie/node1.example.com@EXAMPLE.COM, and restart oozie. Log onto node 2 where the second Oozie server is running and perform the following
su oozie edit oozie-site.xml located in /etc/oozie/conf Change all principal names from node 1 to node 2 export HADOOP_HOME=/usr /usr/lib/oozie/bin/oozied.shstop /usr/lib/oozie/bin/oozied.sh start ---------------------------------------------------------------- After Enabling Hue for Kerberos and LDAP, the File Browser Errors out When you log into Hue with an AD account (after configuring for LDAP) you receive the following error: 2015-05-06 09:50:25,698 INFO [][hue:] GETFILESTATUS Proxy user [hue] DoAs user [admin]
2015-05-06 09:50:25,712 WARN [][hue:] GETFILESTATUS FAILED [GET:/v1/user/admin] response [Internal Server Error] SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] Resolution
With NameNode HA, HTTPFS needs to be configured. If there is no NameNode HA, WebHDFS needs to be configured. You need to configure hadoop-httpfs to use kerberos. Make changes in httpfs-site.xml on the Hue box to change from simple authentication to kerberos Edit the/etc/hadoop-httpfs/conf.empty/httpfs-site.xmlfile onHue Node
</property> <property> <name>httpfs.hadoop.authentication.type</name> <value>simple</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.principal</name> <value>httpfs/huenode.EXAMPLE.com@EXAMPLE.COM</value> </property> <property> <name>httpfs.hadoop.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/httpfs.service.keytab</value> </property> <property> <name>httpfs.authentication.kerberos.name.rules</name> <value>RULE:[rm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nm@.*EXAMPLE.COM)s/.*/yarn/ RULE:[nn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[dn@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[hbase@.*EXAMPLE.COM)s/.*/hbase/ RULE:[oozie@.*EXAMPLE.COM)s/.*/oozie/ RULE:[jhs@.*EXAMPLE.COM)s/.*/mapred/ DEFAULT</value>
</property>
</configuration> And then restart Hadoop-httpfs…. It appears that we need a keytab for httpfs however. ---------------------------------------------------------------- Where Can I find the commands that Ambari runs for Kerberos
what commands Ambari runs to add the key tabs for AD option and it is no where to be found in the logs... ? Answer:
ktadd
in /var/lib/ambari-server/resources/common-services/KERBEROS/package/scripts/kerberos_common.py -> function create_keytab_file ---------------------------------------------------------------- Help. I have long running jobs and my Tokens are expiring leading to Job Failures Possible Resolution Steps
First stop – NTP. Do a pdsh and reset and restart ntp service on all nodes. Check the JDK. Any JDK version 1.7 update 80 or later and 1.8 update 60 or earlier are known to be having problem with processing Kerberos TGT tickets.
Change the max renewable life and ticket life time
> kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@<REALM.COM>
>klist
Check for the expiration of krbtgt/<REALM.COM>@<REALM.COM> principal. Is it Seven days or one day?
Look at max_renewable_life in /var/kerberos/krb5kdc/kdc.conf. Is it 7d? 14d? Is it different from the krbtgt/<REALM.COM>@<REALM.COM> expiration length
Change max_renewable_life in /var/kerberos/krb5kdc/kdc.conf to 14d
Change the principal krbtgt/<REALM.COM>@<REALM.COM> maxrenewlife to renew after the same time as max_renewable_life
If it is MIT kerberos you would have to use kadmin(https://web.mit.edu/kerberos/krb5-1.12/doc/admin/admin_commands/kadmin_local.html and https://blog.godatadriven.com/kerberos_kdc_install.html ) If it is AD as the administrator for the commands
Kadmin –p admin
Then kadmin: modprinc -maxrenewlife "7 days" krbtgt/<REALM.COM>@<REALM.COM>
What about ticket_lifetime in vi /etc/krb5.conf? Is there a renew_lifetime? max_life?
You can change it to be more than 24h and
restart krb5kdc service
Double check the chron job. You can find examples to compare vis Google (e.g. http://wiki.grid.auth.gr/wiki/bin/view/Groups/ALL/HowToAutomaticallyRenewKerberosTicketsAndAFSTokens)
... View more
10-28-2015
06:24 PM
10 Kudos
Requirement: Currently we have /hadoop/hdfs/data and /hadoop/hdfs/data1 as datanode directories. I have new mountpoint (/hadoop/hdfs/data/datanew) with faster disk and I want to keep only this mountpoint as datanode directory. Steps: Stop the cluster. Go to the ambari HDFS configuration and edit the datanode directory configuration: Remove /hadoop/hdfs/data and /hadoop/hdfs/data1. Add /hadoop/hdfs/datanew save. Login into each datanode VM and copy the contents of /data and /data1 into /datanew Change the ownership of /datanew and everything under it to “hdfs”. Start the cluster.
... View more
Labels:
10-23-2015
12:58 AM
1 Kudo
You have generated a certificate file: openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout test.key -out test.pem
After deploying the key, you try to ssh into the instance but get prompted for a password. ssh -vvv -i test.pem <user>@<host> ————— This is an issue with an updated openssl version, > openssl versionOpenSSL 1.0.1k 8 Jan 2015 This is a new version of openssl. The new version does not create the key with RSA at the begin and end. So you have to use a separate command to convert the key file to old version of ssh openssl rsa -in test.key -out test_new.key Once that is done, use the new file for ssh. ssh -vv -i test_new.key <user>@<host>
... View more
10-23-2015
12:39 AM
5 Kudos
How to setup a cluster in AWS? What type of storage is supported for HDFS? EBS? EMR?
EBS is supported and recommended mainly for mission critical, that is for data that must be (mostly) available. You can do ephemeral storage, which will be faster, but if the node goes down you won’t be able to restore that data and since AWS (and other cloud providers) are known to have entire regions disappear, you can and will lose your whole cluster EBS volumes will be available again when the region comes back online, ephemeral won’t. However EBS is also very pricy and you may not want to pay for that option. However another option is using ephemeral storage, but setting up backup routines to S3, so you can restore back to a point in time. (If you want you can use EBS and back up with S3). I guess the main reason EBS is not recommended for HDFS also is that it is very expensive, but it is supported. For HBASE workloads you should use i2. Only use d2 nodes for a storage density workload type (w/ sequential read), which gave you a lot of locally attached storage and the throughput is quite good.
Other Storage Tips: Hs1.8xl for Hadoop with ephemeral storage. I2 for hbase D2.8xl for compute intensive hbase plus data intensive storage. Ebs is very expensive and scaling is not so linear. Depends on how many storage array fabrics you mesh to under the covers. The instance/ephemeral storage (on AWS) would only be for data node HDFS. Therefore lose of an instance is less of a concern. Its also going to get much better performance.
... View more
10-22-2015
06:07 PM
Good note. Unfortunately it does not recognize the generic Apache Hive JDBC driver. Also if you need to add special properties for ssl, or kerberos or ldap authentication, SQL Developer will not work.
Use SQL Workbench J, RazorSQL or Squirrel SQL instead.
... View more
- « Previous
- Next »