Member since
07-27-2015
35
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9090 | 04-06-2018 01:05 AM |
04-06-2018
01:05 AM
Hi, This does not seem to have worked with a latter version of CDH (5.13.1). There we had to set this through - YARN Client Advanced Configuration Snippet (Safety Valve) for yarn-site.xml So, what is the correct way to set this? Is this really changed with newer releases? Thanks, Sumit
... View more
07-10-2017
09:37 AM
The documentation at https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_ig_req_supported_versions.html#xd_583c10bfdbd326ba--43d5fd93-1410993f8c2--7e14 states - Although some subareas of the product may work with unsupported custom multihoming configurations, there are known issues with multihoming. Would anyone be able to point me to known issues with multihoming being referred to, here? Would those be in context of hadoop/ CDH only? Thanks, Sumit
... View more
Labels:
04-25-2017
06:43 AM
One way I could find so far where I can find ENV variables is say by going to some instance specific page (such as say, NodeManager): http://inldmcdh1.example.com:7180/cmf/services/7/instances/71/processes The page shows: Environment Variables: INFA_TRUSTSTORE=/opt/manojkeyfiles JAVA_HOME=/opt/java/jdk1.8.0_45 YARN_LOG_DIR=/data/hadooplogs/var/log/hadoop-yarn YARN_ROOT_LOGGER=INFO,RFA IS_KERBERIZED=false YARN_LOGFILE=hadoop-cmf-yarn-NODEMANAGER-inldmcdh2.example.com.log.out NM_LOCAL_DIRS=/data/yarn/nm /dfs/1/yarn/nm /dfs/2/yarn/nm /dfs/3/yarn/nm /dfs/4/yarn/nm /dfs/5/yarn/nm CDH_VERSION=5 KRB5_CONFIG=/opt/keytab/krb5.conf YARN_NODEMANAGER_OPTS=-Xms1073741824 -Xmx1073741824 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dhadoop.event.appender=,EventCatcher -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/killparent.sh HADOOP_CLIENT_CONF_DIR=/etc/hadoop/conf.cloudera.yarn Is there a direct API to get the ENV vars from here? Thanks, Sumit
... View more
04-25-2017
03:02 AM
Hi, Is there a way to programmatically fetch value of JAVA_HOME env variable set on cluster using Cloudera Manager APIs? Thanks Sumit
... View more
Labels:
02-06-2017
07:33 PM
@samurai - Yes, there were 2 main issues. One, was that these were VMs and another was that zookeeper was collocated with another service which shared the same disk.
... View more
10-13-2016
03:48 AM
If you are only looking to get the list of columns, you can use a degenerate query like - SELECT * FROM TBL WHERE 1 = 0 And then use java.sql.ResultSetMetaData to iterate over the columns.
... View more
10-13-2016
03:35 AM
One thing we could debug here is that the table was created via Apache Phoenix. The table is empty. When we query the table via phoenix shell, we get the ThrottlingException. The same works fine through hbase shell. I use Hbase 1.1.2 and Phoenix 4.7. Anyone has faced this issue before?
... View more
09-21-2016
10:31 AM
Hi Our application is unable to scan or read from hbase tables, when throttling is set. The ThrottlingException seems to have some correlation to pre-splits because it shows up as and when pre-splits for tables are increased. This is despite the fact that the table under consideration is mostly empty. We have tried both rate limiters already - average and fixed. Can't understand why read rate limit exceeds when there is hardly any data in hbase. Anyone has faced this issue before? One thing we are planning to try is reattempt the operations when we get this error. Not sure, if that is helpful or not. However, would like to know how are others able to get this working. Setup Details: Hbase version : 1.1.2 Number of region servers :4 Number of regions : 116 HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Region server stack trace (notice below that the error is about read size limit exceeded, and later the size of scan is only 28 (bytes?): 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806 org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107) at org.apache.hadoop.hbase.quotas.ThrottlingException.throwReadSizeExceeded(ThrottlingException.java:101) at org.apache.hadoop.hbase.quotas.TimeBasedLimiter.checkQuota(TimeBasedLimiter.java:139) at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota(DefaultOperationQuota.java:59) at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(RegionServerQuotaManager.java:180) at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(RegionServerQuotaManager.java:125) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2265) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Thanks, Sumit
... View more
Labels:
04-07-2016
03:05 AM
No, those are 2 different properties. One for vmem and the other one for pmem.
... View more
12-02-2015
07:45 PM
Thanks Harsh, So, to generalize, the mechanism level subcodes can always be taken as some failure in communicating with KDC, right? I also see that despite this error, ZK does continue to function ... so is this error to be really treated seriously? Thanks again.
... View more
11-23-2015
01:01 AM
I use a kerberized cluster and once in a while I notice following error in my zookeeper client logs: 15/11/15 15:46:53 ERROR client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)] ) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 15/11/15 15:46:53 ERROR zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)] ) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. So, I had following doubt with this: It is showing actual error to be connection reset. I am not sure connection RST to what? Is it to Kerberos KDC? But the log further seems to indicate that connection issue happened when connecting to ZK quorum member. So, in that case the RST flag is recd from ZK quorum member? Thanks, Sumit
... View more
Labels:
10-13-2015
01:27 AM
I researched a bit more. However, I am unable to find any means to use Prep Statements with phoenix input format. Anyone has contrary comments?
... View more
10-12-2015
09:27 PM
I am trying to enable HA for Resource Mgr as well NameNode. However, very often the masters failover to standby. There is no issue with HA as such, but every failover ends up exhausting one application attempt. I notice following issues: A series of slow fsync followed (sometimes only) by CancelledKeyException. 2015-10-12 17:22:41,000 - WARN [SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 6943ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2015-10-12 17:22:41,001 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1505bcdb3e3054e
2015-10-12 17:22:41,002 - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1505bcdb3e30003 type:ping cxid:0xfffffffffffffffe zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:null Error:KeeperErrorCode = Session moved
2015-10-12 17:22:41,004 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.65.144.35:36030 which had sessionid 0x1505bcdb3e30003
2015-10-12 17:22:41,006 - ERROR [CommitProcessor:3:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:644)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74) The time taken is some time as high as 10sec. This could surely timeout the clients, I suppose leading to deletion of ephemeral nodes that masters created. Around this time, the masters switch over. I have seen that disk space is not a concern. However, at times the await time to ZK dataDir drive does show a surge. I also confirmed that GC pauses are minimal. Any pointers would be really appreciated.
... View more
Labels:
10-09-2015
07:33 AM
I want to use PreparedStatement when setting a query to Phoenix record reader/ calculating input splits. I am not able to find other means than this -
PhoenixConfigurationUtil.setInputQuery(conf, query);
which takes in a raw String sql query. How do I change this to PreparedStatement rather than a hard coded Statement?
Thanks, Sumit
... View more
Labels:
09-22-2015
03:57 AM
Hi, I have used various hbck options with varied success in case of Hbase master failing to startup after a shutdown. Plus, using zkcli has also helped in clearing znodes. However, it has been more of trial and error. Is there any recommendation document as to what options to use when getting such startup errors which are seemingly related to data corruption (such as say, TableExistsException, TableDoesNotExistException, TableEnabled, Disabled exceptions etc.)? Plus, is it right to assume that these can only happen due to unclean shutdown of Hbase? Also, using zkcli to clear (rmr) entire /hbase could cause data loss because we would lose all znodes related to Hbase (including meta)? Thanks, Sumit
... View more
Labels:
09-14-2015
09:02 PM
Now, my plan is to use Phoenix 4.5 with CDH 4.2 Can someone confirm if Kerberos is not a concern?
... View more
09-01-2015
02:29 AM
I am planning on using Phoenix with CDH 5.4 Can you confirm the following please: 1. Can I use phoenix 4.2 with CDH 5.4 ? 2. What are the steps to get it work with Kerberos in CDH? I am planning on using JDBC APIs but do notice ome forums which describe issues in some Phoenix classes with Kerberos enabled. Can you please confirm if Kerberos is fully supported? Thanks Sumit
... View more
08-25-2015
08:08 AM
I notice that memory on another CDH host is overcommitted. Of the total 62GB phy memory, the memory allotted to YARN containers is 40GB and that consumes the major chunk. I notice that my apps remain in ACCEPTED state despite requesting for only 1GB for each container. 1. Is this 40GB reserved by YARN at startup? 2. Plus, I assume that does not contribute to apps remaining in ACCEPTED state because it is from this pool of 40GB that my apps should get memory allocated, right? Thanks, Sumit
... View more
08-20-2015
08:22 PM
So, the only thing which is not clear to me now is why was I allowed only 2 max apps per user when it was not set anywhere explicitly? Is it possible to infer that from data being given above? Or what other data point(s) influenced limiting it to 2 apps per user? Thanks, Sumit
... View more
08-20-2015
07:27 AM
Thank you Wilfred. Where do I check " Max Application Master Share"? Plus, does Fair Scheduler take independent disks into account when calculating number of containers possible?
... View more
08-13-2015
12:10 AM
Ok, figured one thing that Num Containers implies total running containers. I have 2 independent physical disks on both nodes.
... View more
08-12-2015
10:40 PM
I have a 2 node cluster with each node having 8GB RAM and 4 cores. On both node, there are apps running having consumed 2 cores each. This leaves me with 2 cores (x 2) on both nodes. Memory used is 4GB of total 16GB available to YARN containers. Some important properties: yarn.nodemanager.resource.memory-mb = 20GB (overcomitted, as I see) yarn.scheduler.minimum-allocation-mb = 1GB yarn.scheduler.maximum-allocation-mb = 5.47GB yarn.nodemanager.resource.cpu-vcores = 12 yarn.scheduler.minimum-allocation-vcores = 1 yarn.scheduler.maximum-allocation-vcores = 12 Using Fair scheduler. With above setting. when I spark-submit the app remains in ACCEPTED state. Here is what I am requesting through this: spark.driver.memory=2G spark.master=yarn-client spark.executor.memory=1G num-executors = 2 executor-memory = 1G executor-cores = 1 As I see, I am requesting a total of 3 cores (1 for driver, by default and 1x2 for executors). A single node does not have 3 cores but has 2 cores. So, ideally I should see them distributed across 2 nodes. Not sure why the spark job should remain in ACCEPTED state. My default queue shows only 25% usage. I notice the following settings too for my default root.default queue: Used Capacity: 25.0% Used Resources: <memory:4096, vCores:4> Num Schedulable Applications: 2 Num Non-Schedulable Applications: 1 Num Containers: 4 Max Schedulable Applications: 2 Max Schedulable Applications Per User: 2 Why do I only get 4 containers in total? Or does it this indicate currently used containers (which in my case is 4)? Plus, why is max schedulable apps only 2? I have not set any user level limits or queue level limits under Dynamic Resource Pool settings.
... View more
08-10-2015
09:35 AM
I also think we can probably compress the binaries before being copied to HDFS and have YARN uncompress them somehow?
... View more
08-07-2015
08:16 PM
Are there any recommendations to speed up deployment of app binaries to YARN? I've been using RM REST APIs to submit apps to it with binaries located on HDFS. This tends to take a lot of time when the size of binaries to be deployed as YARN app are big in size (say, >500MB or more), and also when number of containers that I need are high. I could probably speed this up by : 1. Turning off default 3 copies needed on HDFS 2. Using HDFS cluster-wide cache which can help avoid block reads 3. Using YARN resource localization Do you have any recommendations which are definitely known to speed this up? Thanks, Sumit
... View more
07-30-2015
02:26 AM
Ok probably was not knowing full details. But doesn't YARN pre-empt containers? If so, what does pre-emption mean for the app? The idea behind knowing it programatically is to help debug faster and also make decisions behind how to prevent that next time. Many times for bigger clusters, the exact reason behind termination can be difficult to determine and can take some time.
... View more
07-29-2015
09:41 PM
Does FairScheduler take only memory into consideration when making a decision or does it also use vcores? If it can depend upon multiple reasons, then again this may be another CR wherein user can get to know the exact reason (possibly through an API call) as to why an app is in ACCPETED state (such as memory, cores, disk space, queue limits, etc.)
... View more
07-29-2015
08:48 PM
You beat me to the answer 🙂 Yes, I figured this has to be set in NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml. Thanks!
... View more
07-29-2015
08:43 PM
Thanks Wilfred. I'd agree about not setting it to false. That's my idea too. The main reason to use that setting is to be able to do some functional testing without getting into tuning as yet. So, is there a way I can set this property through UI?
... View more
07-29-2015
08:41 PM
Currently, there are multiple reasons for an app to get killed of which some are: 1. Natural termination as the app is over. 2. Container got killed for various limits being exhausted - such as say, physcial memory limit, virtual memory limit, etc. 3. AM got killed more than max-attempts time. 4. Queue level limits being reached. 5. Overall max allowed containers memory limit being reached for a given Node Mgr. 6. Pre-emption. 7. Security tokens invalid. 8. Admin killed it through UI. 9. Recovery of apps. not supported when RM or NM restart. 10. Etc. These show up in logs but is there a way we can determine them programatically, say through some Resource Mgr REST API call? Probably, diagnostic field in it can be used to populate the exact reason when YARN is aware of it?
... View more