@Michael Bronson Yes true, similarly it's not a good idea to use /var for yarn.nodemanager.local-dirs which are container local. Typically, you can direct these to all the data mount points (like /grid/sdb/hadoop/yarn/local). Same thing for yarn logs (/grid/sdb/hadoop/yarn/log) yarn.nodemanager.log-dirs. This can help with reducing all your IO going to your OS disk (where you typically have /var). You can take a look at I hope that the above answers your questions. Please accept the answer you found most useful.
@Michael Bronson Yes you can remove old files. Below article help you to with proper steps. Please accept the answer you found most useful.
Problem Description: Ambari-infra-solr is running fine but using a "ps" command shows a password like below. According to security policy, this is consider as security breach.The issue occurred because the value of property infra_solr_trust_store_password and infra_solr_key_store_password showing cleartext passwords in java Options. $ ps -ef | grep -i 'ambari-infra'
1008 25938 1 21 07:25 ?00:00:11 /usr/jdk64/jdk1.8.0_112/bin/java -server -Xms1024m -Xmx2048m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/ambari-infra-solr/solr_gc.log -DzkClientTimeout=60000,, -Djetty.port=8886 -DSTOP.PORT=7886 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/usr/lib/ambari-infra-solr/server -Dsolr.solr.home=/opt/ambari_infra_solr/data -Dsolr.install.dir=/usr/lib/ambari-infra-solr -Dlog4j.configuration=file:/etc/ambari-infra-solr/conf/ -Dsolr.jetty.keystore=/etc/security/serverKeys/infra.solr.keyStore.jks -Dsolr.jetty.keystore.password=bigdata -Dsolr.jetty.truststore=/etc/security/serverKeys/infra.solr.trustStore.jks -Dsolr.jetty.truststore.password=bigdata -Dsolr.jetty.ssl.needClientAuth=false -Dsolr.jetty.ssl.wantClientAuth=false -Dsolr.jetty.https.port=8886 -Dsolr.authentication.httpclient.configurer=org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer -Dsolr.kerberos.principal=HTTP/ -Dsolr.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab -XX:OnOutOfMemoryError=/usr/lib/ambari-infra-solr/bin/ 8886 /var/log/ambari-infra-solr -jar start.jar --module=https Article: This article help to set hash password instead of showing clearest passwords in java options. Using Ambari inbuilt jetty jar file, we can hash password either OBF or MD5 format and pass those value in infra-solo-env to hide password from ambari-infra solr process. Step-1: Generate encrypt password using jetty jar file, where <password> is the password you used for the keystore/truststore java -cp /usr/lib/ambari-infra-solr/server/lib/jetty-util-9.2.13.v20150730.jar <password> java -cp /usr/lib/ambari-infra-solr/server/lib/jetty-util-9.2.13.v20150730.jar bigdata 2018-12-27 07:51:13.605:INFO::main: Logging initialized @171ms bigdata OBF:1rpc1wtw1sp11sov1sop1wui1rpa MD5:27819cfe72583a34d13a40bb74154c91 Step-2: Update below properties from Ambari under Ambari Infra Config Tab in Advanced infra-solr-env section (You can mention hashed_password of either OBF or MD5 there) Before: SOLR_SSL_KEY_STORE_PASSWORD={{infra_solr_keystore_hashed_password}} SOLR_SSL_TRUST_STORE_PASSWORD={{infra_solr_truststore_hashed_password}} Now: SOLR_SSL_KEY_STORE_PASSWORD=OBF:1rpc1wtw1sp11sov1sop1wui1rpa
SOLR_SSL_TRUST_STORE_PASSWORD=OBF:1rpc1wtw1sp11sov1sop1wui1rpa Step-3: Need to restart required services through ambari and verify with grep process of ambari-infra solr process. $ ps -ef | grep -i 'ambari-infra'1008
17641 17 08:03 ?00:00:10 /usr/jdk64/jdk1.8.0_112/bin/java -server -Xms1024m -Xmx2048m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/ambari-infra-solr/solr_gc.log -DzkClientTimeout=60000,, -Djetty.port=8886 -DSTOP.PORT=7886 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/usr/lib/ambari-infra-solr/server -Dsolr.solr.home=/opt/ambari_infra_solr/data -Dsolr.install.dir=/usr/lib/ambari-infra-solr -Dlog4j.configuration=file:/etc/ambari-infra-solr/conf/ -Dsolr.jetty.keystore=/etc/security/serverKeys/infra.solr.keyStore.jks -Dsolr.jetty.keystore.password=OBF:1rpc1wtw1sp11sov1sop1wui1rpa -Dsolr.jetty.truststore=/etc/security/serverKeys/infra.solr.trustStore.jks -Dsolr.jetty.truststore.password=OBF:1rpc1wtw1sp11sov1sop1wui1rpa -Dsolr.jetty.ssl.needClientAuth=false -Dsolr.jetty.ssl.wantClientAuth=false -Dsolr.jetty.https.port=8886 -Dsolr.authentication.httpclient.configurer=org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer -Dsolr.kerberos.principal=HTTP/ -Dsolr.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab -XX:OnOutOfMemoryError=/usr/lib/ambari-infra-solr/bin/ 8886 /var/log/ambari-infra-solr -jar start.jar --module=https Ambari will automatically decrypt password with inbuilt jetty jar. For more details of jetty you can refer following link,
@Dukool SHarma The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. For more Information you can reference below links: Please accept the answer you found most useful.
For your question, answer is No. As of now we can't mange memory of individual node level using percentage. Currently yarn support only CPU, using below configuration. yarn.nodemanager.resource.percentage-physical-cpu-limit 100 Percentage of CPU that can be allocated for containers. This setting allows users to limit the amount of CPU that YARN containers use. Currently functional only on Linux using cgroups. The default is to use 100% of CPU.
For example your physical memory size is 10GB you need to allocated for every container request at the RM, is around 9GB (90% of your physical memory) then you can mention in below parameter yarn.scheduler.maximum-allocation-mb 9216 Similarly amount of physical memory allocated for containers in node manager, that can be mention using below parameter. yarn.nodemanager.resource.memory-mb 9216 These all settings are manage by single configuration file I.e., yarn-site.xml which you can modify via ambari under Yarn —> Configs —> Settings —>Advanced —> Advanced yarn-site I hope that the above answers your questions. Please accept the answer you found most useful.
@Anjali Shevadkar Below article help you to configure spark fine grain security. Please accept the answer you found most useful.
If you set 100% ram only for yarn then other services will get crash / GC issue. It's not recommendable to set all physical memory to one component.
For changing memory value through ambari configuration manager, login into ambari UI, Yarn —> Configs —> Settings —> memory —> Memory allocated for all YARN containers on a node and Minimum / Maximum Container Size memory. and for advanced turning of yarn modify Yarn —> Configs —> Settings —>Advanced —> Advanced yarn-site
