Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1975 | 07-09-2019 12:53 AM | |
| 11895 | 06-23-2019 08:37 PM | |
| 9160 | 06-18-2019 11:28 PM | |
| 10152 | 05-23-2019 08:46 PM | |
| 4588 | 05-20-2019 01:14 AM |
10-05-2016
12:43 AM
1 Kudo
For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release. For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.
... View more
09-20-2016
06:55 AM
1 Kudo
Yes, you need to switch Oozie to submit over YARN and not MRv1. The switching guide covers this aspect.
... View more
09-20-2016
06:47 AM
You cannot run Spark on MR1 clusters. You will need a YARN cluster setup first, and Oozie switched over to that, before you can attempt the Spark action. To migrate to YARN, please follow https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_mr_and_yarn.html#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7f23__section_dtc_lwx_yq
... View more
09-11-2016
04:06 AM
1 Kudo
The Result's Cell APIs fetches you the timestamp of the selected row/column when reading: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/Cell.html#getTimestamp() and the Put API request allows you to specify one when writing: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/client/Put.html#addColumn(byte[],%20byte[],%20long,%20byte[]) Row keys are immutable, so what you are looking to do cannot be done in-place. I'd recommend running an MR job to populate a new table sourcing and transforming data from the older one. Pre-split the newer table adequately with the changed row key format for better performance during this job. After the transformation you can rename the table back into the original name if you'd like to do that. MR input would be a TableInputFormat from source table. Your table input scan should likely also filter for those rows you are specifically targeting. MR output would be a TableOutputFormat for destination table. Map function would be the row key transformer code that transfers the Result's Cell list contents into a Put with just the row key altered for new format while retaining all other columnar data as-is via the above APIs. Alternatively, your destination table can be the same as source, but do also a Delete operation at end of the job/transformation for the older row key copy.
... View more
09-08-2016
03:53 AM
1 Kudo
> Is there a timeline or intentions to update the repo version of kafka to 0.9? Kafka 0.9 has been available for RHEL7 based distributions via http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/2.0.2/RPMS/noarch/ for ex.. What URL are you currently pointing your Yum kafka repository configuration to? > Will it introduce any problem migrating cdh from packages to parcels at this point? No, and you can follow http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_migrating_packages_to_parcels.html to do this. > Is it just that parcel or will it become a chain of dependencies I have to download and replicate locally in parcel-repo? Usually just a parcel is required.
... View more
09-01-2016
06:11 PM
1 Kudo
Could you tail and check your NameNode log to observe what it prints as a security error when you attempt this request on it? Does your command use the same JVM (with unlimited JCE jars installed if applicable) as the server does?
... View more
09-01-2016
01:22 AM
As you can note on https://aws.amazon.com/ec2/instance-types/ and http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-lifetime, the m3.xlarge uses 2x "instance store" type disks, which will be entirely destroyed when you stop an instance. When you bring back your instance, it would not have any of its past persisted data, and that's not acceptable to a lot of CM and CDH components. Your blocks on HDFS would no longer be on the disk so they'd be reported as missing too. You should instead use instances that provide "EBS" storage so the data persists. For cloud environment deployments we recommend using Cloudera Director to install, deploy and run your Cloudera CM and CDH cluster instead of manually managing it, to avoid the little problems such as these: https://www.cloudera.com/documentation/director/latest/topics/director_intro.html You can also checkout what instance types are recommended by Cloudera Director for CM and CDH here: https://www.cloudera.com/documentation/director/latest/topics/director_deployment_requirements.html#concept_fhh_ygd_nt_a
... View more
08-29-2016
08:56 PM
I'd recommend looking for WARN or higher logs with the reference "Checkpoint" in them, to find why it aborts mid-way frequently. There were some timeout associated issues in very early CDH4 period, but I've not seen this issue repeat with CDH5, even for very large fsimages.
... View more
08-29-2016
07:03 AM
1 Kudo
Yes it is safe to delete them while the NameNode is running but leave the most recent file alone as that may be actually in progress. The past ones are leftover files from failed checkpoint operations. Its concerning though that you are observing this though, as it would also mean you may not have a fully done checkpoint yet. What is your CDH version for this HDFS?
... View more
08-29-2016
06:52 AM
The move by itself would be as trivial as doing an mv/cp across the new disk, while also ensuring the permissions stay intact. In terms of using dedicated disk, the more important requirement is that of the dataLogDir (than the dataDir). ZK calls fsync on the logs written into the dataLogDir which can end up blocking for a long time when there are other processes sharing the disk. You can and should keep the dataDir (where snapshots get stored) separate from the dataLogDir. This way large snapshot writes don't affect the transaction logging performance either. The dataDir location can be on a shared disk as its write is not synchronous. Does this help?
... View more