About nsabharwal

nsabharwal · ‎11-26-2015

@Jonas Straub It was shared by someone from the field. Link Hadoop provides a system for processing large amounts of data, this could be a great way to actually build your index. You can store the data to be indexed on HDFS and then run a map/reduce job that processes this data and feeds it to your Solr instances that then build up your index. With an index a terabyte in size you will see great performance gains when you both process this data in parallel on your Hadoop cluster and then index it in parallel with your Solr cluster.

nsabharwal · ‎11-26-2015

https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/host_scripts/alert_disk_space.py vi /var/lib/ambari-server/resources/host_scripts/alert_disk_space.py # defaults in case no script parameters are passed MIN_FREE_SPACE_DEFAULT = 5000000000L #5GB PERCENT_USED_WARNING_DEFAULT = 50 PERCENT_USED_CRITICAL_DEFAULT = 80 You can change the above parameter to avoid alerting at 80% threshold reached in case it needs to be aletered to 85 or 90

nsabharwal · ‎11-26-2015

Caused by: java.lang.OutOfMemoryError: Java heap space This particular case is related to "Reducer tasks of hive job fails with Out Of Memory error during shuffle fetcher stage" Fix: Increase hive.tez.container.size if it is set too low. tez.runtime.shuffle.memory.limit.percent from default value 0.7 Changed to 0.4 Decrease tez.runtime.shuffle.fetch.buffer.percent from default 0.25 to .15 if needed. (Different values were tested between the range of 0.25 to 0.10)

nsabharwal · ‎11-26-2015

Use case There are 2 groups Analytics and DW. We want to split the cluster resources between these 2 groups. User - neeraj belongs to Analytics group. User - dwuser belongs to DW group User neeraj is not allowed to use Default and dwuser queue. Be default, all the jobs submitted by user neeraj must go to it's assigned queue. User dwuser is not allowed to use Default and Analytics queue. By default, all the jobs submitted by user dwuser must go to it's assigned queue. Environment HDP 2.3 (Hortonworks Data Platform) and Ambari 2.1 This tutorial completely independent of Hadoop distribution. Yarn is must i,e Hadoop 2.x I will be using Capacity Scheduler view to configure queues. yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.queue-mappings=u:neeraj:Analytics,u:dwuser:DW yarn.scheduler.capacity.queue-mappings-override.enable=true yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=yarn yarn.scheduler.capacity.root.acl_submit_applications=yarn yarn.scheduler.capacity.root.Analytics.acl_administer_queue=yarn yarn.scheduler.capacity.root.Analytics.acl_submit_applications=neeraj yarn.scheduler.capacity.root.Analytics.capacity=60 yarn.scheduler.capacity.root.Analytics.maximum-capacity=60 yarn.scheduler.capacity.root.Analytics.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.Analytics.ordering-policy=fifo yarn.scheduler.capacity.root.Analytics.state=RUNNING yarn.scheduler.capacity.root.Analytics.user-limit-factor=1 yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_administer_queue=yarn yarn.scheduler.capacity.root.default.acl_submit_applications=yarn yarn.scheduler.capacity.root.default.capacity=10 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.DW.acl_administer_queue=yarn yarn.scheduler.capacity.root.DW.acl_submit_applications=dwuser yarn.scheduler.capacity.root.DW.capacity=30 yarn.scheduler.capacity.root.DW.maximum-capacity=30 yarn.scheduler.capacity.root.DW.minimum-user-limit-percent=100 yarn.scheduler.capacity.root.DW.ordering-policy=fifo yarn.scheduler.capacity.root.DW.state=RUNNING yarn.scheduler.capacity.root.DW.user-limit-factor=1 yarn.scheduler.capacity.root.maximum-capacity=100 yarn.scheduler.capacity.root.queues=Analytics,DW,default [root@nsfed01 ~]# su - neeraj [neeraj@nsfed01 ~]$ mapred queue -showacls 15/08/18 14:45:03 INFO impl.TimelineClientImpl: Timeline service address: http://nsfed03.cloud.hortonworks.com:8188/ws/v1/timeline/ 15/08/18 14:45:03 INFO client.RMProxy: Connecting to ResourceManager at nsfed03.cloud.hortonworks.com/172.24.64.22:8050 Queue acls for user : neeraj Queue Operations ===================== root Analytics SUBMIT_APPLICATIONS DW default [neeraj@nsfed01 ~]$ [root@nsfed01 ~]# su - neeraj [neeraj@nsfed01 ~]$ yarn jar /usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 20 1000000009 Number of Maps = 20 Samples [root@nsfed03 yarn]# su - dwuser [dwuser@nsfed03 ~]$ yarn jar /usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 20 1000000009 Number of Maps = 20 CS view

nsabharwal · ‎11-26-2015

yum install expect* #!/usr/bin/expect spawn ambari-server sync-ldap --existing expect "Enter Ambari Admin login:" send "admin\r" expect "Enter Ambari Admin password:" send "admin\r" expect eof

nsabharwal · ‎11-26-2015

@Mike Li

nsabharwal · ‎11-26-2015

@Jonas Straub nice!!!

nsabharwal · ‎11-26-2015

@Hajime This makes sense hive.exec.reducers.bytes.per.reducer Default Value: 1,000,000,000 prior to Hive 0.14.0; 256 MB ( 256,000,000 ) in Hive 0.14.0 and later Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if the input size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later the default is 256 MB, that is, if the input size is 1 GB then 4 reducers will be used. Point to note: Calculate hive.exec.reducers.max should be set to a number which is less than the available reduce slots on the cluster. Hive calculate the reducers based on hive.exec.reducers.bytes.per.reducer (default 1GB). Consider setting this high based on the workloads and demand for the reducers on the cluster https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

nsabharwal · ‎11-25-2015

@Kuldeep Kulkarni Couple of links http://www.lilyproject.org/lily/index.html and this blog explains it nicely Also, you can leverage JMX to expose the metrics.

nsabharwal · ‎11-25-2015

@Ryan Templeton Launching a support case will be a good idea.

Online	Offline
Last Visited	‎07-18-2019 05:10 PM

Member Since	‎09-18-2015 05:49 PM
Last Visited	‎07-18-2019 05:10 PM
Posts	3,274
Kudos received	1129

Cloudera Community

Re: Is Ranger KMS Encryption FIPS 140-2 compliant ...

Re: How to add another HiveServer for current meta...

Re: FQDNs - are they necessary?

Re: java.io.FileNotFoundException: (Is a director...

Re: Need Design/Architecture Suggestion on HDP & H...

Re: SolrCloud Performance - HDFS index/data

Change ambari alert threshold values for disks

Hive OOM - Caused by: java.lang.OutOfMemoryError:...

Yarn queues - No Capacity Scheduler view

Ambari LDAP sync

Re: Ambari Metric Collector: Error sending metric ...

Re: Ambari Audit log

Re: Is there any specific reason Ambari sets 64MB ...

Re: Is there any way to find out how much time is ...

Re: Hive Explain says "Plan not optimized by CBO d...