Member since
07-18-2016
24
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4115 | 09-17-2017 12:25 PM |
03-04-2019
05:43 PM
Thanks. I successfully rescued the unrecognized blocks. Addressing the underlying issue of missing but finalized blocks will take time. Hopefully upgrading to a later CDH will work.
... View more
02-28-2019
02:57 PM
The histogram of {count,disk} was preliminary. Here is a final tally: count, drive 144 00 6 01 290 02 155 03 154 04 134 05 167 06 172 08 2 09 144 10 2 11 143 12 7 13 130 15 151 16 280 17 7 19 5 20 2 21 172 22 139 23 4 24 160 25 171 26 162 27 10 28 184 29 171 30 3 31 5 32 7 33 4 34 248 35
... View more
02-28-2019
02:51 PM
> Do you get a full strack trace in the Namenode log at the time of the error in the datanode? No. I showed the entire log msg. that was visible. I checked NN stdout, and did not see a stacktrace. > Have all these files with missing blocks got replication factor of 1 or have they a replication factor 3? repl factor is 3, but the problem files did not reach that level due to competing acivity. The DN in question had to be bounced due to a failed disk (it has 36 8TB disks), apparently at an inopportune moment. I think the excessive RBW files (up to 1 year old) are not the cause since I moved most of them away, restarted DN, and there was no change. Current problem summary: finalized blocks/replicas in a DN are reported as missing, and "RemoteException in offerService" WARN appears in DN log. More info: I found that of the 4400 block/replicas missing cluster-wide, there are 3500 spread across 36 disks on one DN under /finalized/, unevenly: count, disk 2 09 2 11 2 21 3 31 4 24 4 34 5 20 5 32 6 01 7 13 7 19 7 33 10 28 15 05 139 23 154 04 159 27 160 25 171 26 280 17 hence, I am unable to place blame on one disk. This DN has 14M blk files, so only a tiny percentage are affected. It sure seems like an uncaught exception caused the block report from this DN to be incomplete.
... View more
02-26-2019
04:44 PM
Another symptom: the DataNode /blockScannerReport on the problem DN always returns this: Periodic block scanner is not yet initialized. Please check back again after some time.
... View more
02-26-2019
03:04 PM
Using CDH 5.3.1 (without CM), I have a DataNode that seems to not start it's block report. The particular DN has 100x more RBW files than other DNs (some RBW files are a year old). The driving symptom is blocks reported missing, but the particular blocks are indeed under /finalized/ directory of the DN. A few thousand files have missing blocks that are in this state and no alternative blocks/replicas are on the cluster, so we would like to recover these files. The missing blocks are NOT under /rbw/ dir., hence the concern over the "RemoteException in offerService" error. Classpath and VERSION files look good compared to known-good DNs. See point-in-time logs entries below. * Question: How can a /finalized/ block (replica) be considered missing after DN has been up for many hours? * Question: what if I manually copy the finalized blk_* files in question to another DN? would that DN pick them up upon restart? * Question: should I manually clean up old (say, older than a few days) RBW files? DN log: 2019-02-26 21:43:20,152 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy11.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:175) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:503) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:716) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:851) at java.lang.Thread.run(Thread.java:745) NN log: 2019-02-26 21:43:20,147 WARN org.apache.hadoop.ipc.Server: IPC Server handler 5 on 6000, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 207.241.230.241:40178 Call#5 Retry#0 java.lang.NullPointerException The NN error above does NOT show up for any other DN.
... View more
Labels:
- Labels:
-
HDFS
09-17-2017
12:25 PM
I found this discussion and noticed that the error "Impala is not supported for RHEL7" can appear in CM 5.11 if you try to install Impala via parcel. Since 2015 and on, Impala is included in the main CDH parcel, and is simply added as a service to the cluster. If you try to add via parcel on RHEL7 (or centos7), you are in effect trying to add an old version (2.0 or earlier) of Impala, and in that case, the statement "Impala is not supported for RHEL7" is actually true. I expect this misleading error message will occur for any attempt on any CM ver. to add Impala via parcels on RHEL7. I hope this clarification helps someone.
... View more
09-16-2016
01:23 PM
That helped. It works now. We had no reason to expect that the instance and instances clauses were necessary for a CM that already exists and was generated by Cloudera Director.
... View more
09-14-2016
07:27 PM
We tried it with and without " Deployment" and " Environment" appended, but alway get the same error. We have successfully added more than one cluster to the particular CM. The tail end of the application.log is below, if that helps. Alternatively, how can we specify "Cloudera Manager URL"? [2016-09-15 02:11:34] INFO [main] - c.c.l.m.PrivateKeySshCredentialsValidator: Validating SSH credentials for ec2-user
[2016-09-15 02:11:34] ERROR [main] - c.c.l.commands.ValidateCommand: Failed to parse deployment configuration
java.lang.IllegalArgumentException: You have to either specify a virtual instance configuration or a Cloudera Manager URL for a new deployment
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93) ~[guava-15.0.jar!/:na]
at com.cloudera.launchpad.model.deployment.DeploymentTemplate.<init>(DeploymentTemplate.java:154) ~[launchpad-model-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.model.deployment.DeploymentTemplateBuilder.build(DeploymentTemplateBuilder.java:474) ~[launchpad-model-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.templates.ConfigToDeploymentTemplate.apply(ConfigToDeploymentTemplate.java:183) ~[launchpad-templates-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.commands.ValidateCommand.run(ValidateCommand.java:136) ~[launchpad-cli-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.commands.BootstrapRemoteCommand.run(BootstrapRemoteCommand.java:74) [launchpad-cli-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.commands.RemoteCommand.run(RemoteCommand.java:198) [launchpad-cli-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.Application.run(Application.java:140) [launchpad-cli-2.1.0.jar!/:2.1.0]
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:806) [spring-boot-1.3.2.RELEASE.jar!/:1.3.2.RELEASE]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:790) [spring-boot-1.3.2.RELEASE.jar!/:1.3.2.RELEASE]
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:777) [spring-boot-1.3.2.RELEASE.jar!/:1.3.2.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) [spring-boot-1.3.2.RELEASE.jar!/:1.3.2.RELEASE]
at com.cloudera.launchpad.Application.start(Application.java:97) [launchpad-cli-2.1.0.jar!/:2.1.0]
at com.cloudera.launchpad.Application.main(Application.java:47) [launchpad-cli-2.1.0.jar!/:2.1.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_72]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_72]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_72]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_72]
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:54) [launchpad-cli-2.1.0.jar!/:2.1.0]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
[2016-09-15 02:11:34] INFO [Thread-9] - o.s.c.a.AnnotationConfionConfigApplicationContext@4f04b01a: startup date [Thu Sep 15 02:11:15 UTC 2016]; root of context hierarchy
... View more
09-14-2016
01:51 PM
What is the right way to re-use (add to) an existing Cloudera Manager using a config file? We are trying aws.ha.reference.conf with this setting and CM paragraph: deploymentName: My_Test" Deployment"
environmentName: myCM" Environment"
// ...
cloudera-manager {
username: admin
password: admin
} Run like this: cloudera-director bootstrap-remote aws.ha.reference.conf --lp.remote.username=admin --lp.remote.password=xxxxxxxx --lp.remote.hostAndPort=localhost:7189 The result is: Cloudera Director 2.1.0 initializing ...
Connecting to http://localhost:7189
Current user roles: [ROLE_ADMIN, ROLE_READONLY]
Found errors in deployment configuration:
* You have to either specify a virtual instance configuration or a Cloudera Manager URL for a new deployment ----------- How can I specify an existing instance? I am referring to a CM that was created by Cloudera Director.
... View more
Labels:
- Labels:
-
Cloudera Manager
07-19-2016
01:52 PM
I would like to use Centos7x, at least for gateway hosts. With Director 1.5, I never had any luck with Centos or multiple templates in one cluster, so I ended up using a sub-optimal instance type. I would like to use r3 series for gateways. Director 2.x supports Centos 7x and Centos 7x for cdh5.7 and later; does it matter to Director which kind of hypervisor is used? r3 series is HVM. I see that the faq recommends finding AMIs with: aws ec2 describe-images \
--output table \
--query 'Images[*].[VirtualizationType,Name,ImageId]' \
--owners 309956199498 \
--filters \
Name=root-device-type,Values=ebs \
Name=image-type,Values=machine \
Name=is-public,Values=true \
Name=hypervisor,Values=xen \
Name=architecture,Values=x86_64 but this specifies xen and RH. Does that mean only xen is supported? Presumably there is an equivalent AMI search command for centos7x. Is it okay to generate a cluster from multiple templates? Thanks!
... View more
Labels:
- Labels:
-
Gateway