Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11479 | 03-21-2017 07:34 PM | |
2894 | 11-16-2016 04:18 AM | |
1619 | 10-18-2016 03:57 PM | |
4276 | 09-12-2016 03:36 PM | |
6240 | 08-25-2016 09:01 PM |
02-13-2016
03:45 AM
@mcarillo the yarn.nodemanager.log-dirs is on the same mounts as your hadoop data directories. See https://community.hortonworks.com/articles/1888/apache-tez-tuning-tips-solving-the-could-not-find.html
... View more
02-12-2016
02:19 PM
@lobna tonn how are your tests doing? Did you decide?
... View more
02-10-2016
09:02 PM
4 Kudos
@Adi Jabkowsky Is this happening when HS2 is started ONLY or when you connect via Beeline or both? Try the following: Your hive.server2.authentication.ldap.baseDN has a blank space. Remove the blank space and restart HS2 from Hosts in Ambari
#From
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value> </value>
</property>
#To
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value></value>
</property> Remove hive.server2.authentication.ldap.Domain or set to Blank. Then log into HS2 using beeline and set your user to myuser@corp.cellcom.co.il as your login and see if it authenticates Set hive.server2.enable.doAs to False so that Hive user executes the query, If you are using a Hive AD user, Double check that the hive AD UID is the same in /etc/passwd file. Make an archive of HS2 Logs, change /etc/passwd to have the same UUID as the AD hive user, and restart HS2.
... View more
02-09-2016
04:15 AM
1 Kudo
HDP 2.3.4 Needs Ambari 2.2. You cannot use Ambari 2.1
... View more
02-08-2016
07:30 PM
1 Kudo
I recommend doing the Solr Standalone. I have always had an issue with Solr Cloud for Ranger Auditing. Are you sure in Advanced ranger-admin-site everything is set appropriately? ranger.audit.source.type = solr
ranger.audit.solr.urls = http://solr_host:6083/solr/ranger_audits
ranger.audit.solr.username = ranger_solr
ranger.audit.solr.password = NONE If you are using the HDFS or Hive plug in did you turn Auri to Solr on?
... View more
02-08-2016
07:10 PM
1 Kudo
Yes you need Kerberos for Ranger to manage Solr. See also https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html (Updated) Or are you referring to Solr Auditing for Ranger. In that case you do not need Kerberos. For Solr Audit see the following: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Ranger_Install_Guide/content/solr_ranger_configure_standalone.html and http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Ranger_Install_Guide/content/audit_to_solr.html If you did the necessary install and Solr audits are not showing, I had a case where I did a ps -ef | ranger and it was running under the wrong uid. I had to kill it first and then restart from Ambari to get the Solr audits to work.
... View more
02-08-2016
07:03 PM
2 Kudos
@Sushil Saxena Your base DN should be(assuming it is NOT AD) hive.server2.authentication.ldap.baseDN: OU=People,O=xx.com Ensure that you go to the host in Ambari (not Dashboard) and restart HiveServer2 from the host list.
... View more
02-08-2016
06:46 PM
Although I have heard the argument that over time, the cost of replacing disk and managing DAS disk with 3 factor replication, makes SAN cheaper, from a TCO perspective
... View more
02-08-2016
06:38 PM
4 Kudos
In addition to putting them on Master Nodes co-located to other resources, the Zookeeper and Journal should be on JBODs... see diagram
... View more
02-08-2016
06:25 PM
5 Kudos
Hadoop is Shared Nothing architecture. SAN Storage usually goes against the grain for distributed storage in a distributed compute environment.
The only central storage we support so far is Isilon because we did some joint engineering with them. Even then, DAS has its advantages (as well as disadvantages mainly because of 3 factor replicator). The main issue is that compute nodes where YARN spins up containers, for every data access needs, having it on separate SAN disk means that every query or access would then have to go over network speeds and would no longer be distributed across the spindles on the storage nodes. That not only decreases access time it introduces more points of failure through switches and creates additional potential for bottleneck.
Normally I would have also compromise a bit for master nodes but I
just came from a client who did VMs with SAN for master nodes and
performance started great but once multiple users came on board and the
master nodes needed to handle more blocks, performance tanked. We
wasted a week and a half moving the master components to physical nodes
on a cluster with data. Painful.
See a good discussion here: http://searchstorage.techtarget.com/video/Understanding-storage-in-the-Hadoop-cluster
... View more