Member since
09-18-2015
3274
Posts
1159
Kudos Received
426
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2576 | 11-01-2016 05:43 PM | |
| 8558 | 11-01-2016 05:36 PM | |
| 4881 | 07-01-2016 03:20 PM | |
| 8213 | 05-25-2016 11:36 AM | |
| 4354 | 05-24-2016 05:27 PM |
11-26-2015
12:32 PM
@Jonas Straub It was shared by someone from the field. Link Hadoop provides a system for processing large amounts of data, this could be a great way to actually build your index. You can store the data to be indexed on HDFS and then run a map/reduce job that processes this data and feeds it to your Solr instances that then build up your index. With an index a terabyte in size you will see great performance gains when you both process this data in parallel on your Hadoop cluster and then index it in parallel with your Solr cluster.
... View more
11-26-2015
12:15 PM
5 Kudos
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/host_scripts/alert_disk_space.py vi /var/lib/ambari-server/resources/host_scripts/alert_disk_space.py # defaults in case no script parameters are passed MIN_FREE_SPACE_DEFAULT = 5000000000L #5GB PERCENT_USED_WARNING_DEFAULT = 50 PERCENT_USED_CRITICAL_DEFAULT = 80 You can change the above parameter to avoid alerting at 80% threshold reached in case it needs to be aletered to 85 or 90
... View more
Labels:
11-26-2015
11:49 AM
2 Kudos
Caused by: java.lang.OutOfMemoryError: Java heap space This particular case is related to "Reducer tasks of hive job fails with Out Of Memory error during shuffle fetcher stage" Fix:
Increase hive.tez.container.size if it is set too low. tez.runtime.shuffle.memory.limit.percent from default value 0.7 Changed to 0.4
Decrease tez.runtime.shuffle.fetch.buffer.percent from default 0.25 to .15 if needed. (Different values were tested between the range of 0.25 to 0.10)
... View more
Labels:
11-26-2015
10:59 AM
1 Kudo
Use case There are 2 groups Analytics and DW. We want to split the cluster resources between these 2 groups. User - neeraj belongs to Analytics group. User - dwuser belongs to DW group User neeraj is not allowed to use Default and dwuser queue. Be default, all the jobs submitted by user neeraj must go to it's assigned queue. User dwuser is not allowed to use Default and Analytics queue. By default, all the jobs submitted by user dwuser must go to it's assigned queue. Environment HDP 2.3 (Hortonworks Data Platform) and Ambari 2.1 This tutorial completely independent of Hadoop distribution. Yarn is must i,e Hadoop 2.x I will be using Capacity Scheduler view to configure queues. yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.queue-mappings=u:neeraj:Analytics,u:dwuser:DW
yarn.scheduler.capacity.queue-mappings-override.enable=true
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=yarn
yarn.scheduler.capacity.root.acl_submit_applications=yarn
yarn.scheduler.capacity.root.Analytics.acl_administer_queue=yarn
yarn.scheduler.capacity.root.Analytics.acl_submit_applications=neeraj
yarn.scheduler.capacity.root.Analytics.capacity=60
yarn.scheduler.capacity.root.Analytics.maximum-capacity=60
yarn.scheduler.capacity.root.Analytics.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.Analytics.ordering-policy=fifo
yarn.scheduler.capacity.root.Analytics.state=RUNNING
yarn.scheduler.capacity.root.Analytics.user-limit-factor=1
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_queue=yarn
yarn.scheduler.capacity.root.default.acl_submit_applications=yarn
yarn.scheduler.capacity.root.default.capacity=10
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.DW.acl_administer_queue=yarn
yarn.scheduler.capacity.root.DW.acl_submit_applications=dwuser
yarn.scheduler.capacity.root.DW.capacity=30
yarn.scheduler.capacity.root.DW.maximum-capacity=30
yarn.scheduler.capacity.root.DW.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.DW.ordering-policy=fifo
yarn.scheduler.capacity.root.DW.state=RUNNING
yarn.scheduler.capacity.root.DW.user-limit-factor=1
yarn.scheduler.capacity.root.maximum-capacity=100
yarn.scheduler.capacity.root.queues=Analytics,DW,default [root@nsfed01 ~]# su - neeraj [neeraj@nsfed01 ~]$ mapred queue -showacls 15/08/18 14:45:03 INFO impl.TimelineClientImpl: Timeline service address: http://nsfed03.cloud.hortonworks.com:8188/ws/v1/timeline/ 15/08/18 14:45:03 INFO client.RMProxy: Connecting to ResourceManager at nsfed03.cloud.hortonworks.com/172.24.64.22:8050 Queue acls for user : neeraj Queue Operations ===================== root Analytics SUBMIT_APPLICATIONS DW default [neeraj@nsfed01 ~]$ [root@nsfed01 ~]# su - neeraj [neeraj@nsfed01 ~]$ yarn jar /usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 20 1000000009 Number of Maps = 20 Samples [root@nsfed03 yarn]# su - dwuser [dwuser@nsfed03 ~]$ yarn jar /usr/hdp/2.3.0.0-2557/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 20 1000000009 Number of Maps = 20 CS view
... View more
Labels:
11-26-2015
10:55 AM
3 Kudos
yum install expect*
#!/usr/bin/expect
spawn ambari-server sync-ldap --existing
expect "Enter Ambari Admin login:"
send "admin\r"
expect "Enter Ambari Admin password:"
send "admin\r"
expect eof
... View more
Labels:
11-26-2015
10:33 AM
2 Kudos
@Hajime This makes sense hive.exec.reducers.bytes.per.reducer
Default Value: 1,000,000,000 prior to Hive 0.14.0; 256 MB ( 256,000,000 ) in Hive 0.14.0 and later Added In: Hive 0.2.0; default changed in 0.14.0 with HIVE-7158 (and HIVE-7917) Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if the input size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later the default is 256 MB, that is, if the input size is 1 GB then 4 reducers will be used. Point to note: Calculate hive.exec.reducers.max should be set to a number which is less than the available reduce slots on the cluster. Hive calculate the reducers based on hive.exec.reducers.bytes.per.reducer (default 1GB). Consider setting this high based on the workloads and demand for the reducers on the cluster https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties
... View more
11-25-2015
08:34 PM
1 Kudo
@Kuldeep Kulkarni
Couple of links http://www.lilyproject.org/lily/index.html and this blog explains it nicely Also, you can leverage JMX to expose the metrics.
... View more
11-25-2015
08:31 PM
1 Kudo
@Ryan Templeton Launching a support case will be a good idea.
... View more