About Harsh J

Harsh J · ‎10-05-2016

For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release. For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.

Harsh J · ‎09-20-2016

Yes, you need to switch Oozie to submit over YARN and not MRv1. The switching guide covers this aspect.

Harsh J · ‎09-20-2016

You cannot run Spark on MR1 clusters. You will need a YARN cluster setup first, and Oozie switched over to that, before you can attempt the Spark action. To migrate to YARN, please follow https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_mr_and_yarn.html#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7f23__section_dtc_lwx_yq

Harsh J · ‎09-11-2016

The Result's Cell APIs fetches you the timestamp of the selected row/column when reading: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/Cell.html#getTimestamp() and the Put API request allows you to specify one when writing: http://archive.cloudera.com/cdh5/cdh/5/hbase/apidocs/org/apache/hadoop/hbase/client/Put.html#addColumn(byte[],%20byte[],%20long,%20byte[]) Row keys are immutable, so what you are looking to do cannot be done in-place. I'd recommend running an MR job to populate a new table sourcing and transforming data from the older one. Pre-split the newer table adequately with the changed row key format for better performance during this job. After the transformation you can rename the table back into the original name if you'd like to do that. MR input would be a TableInputFormat from source table. Your table input scan should likely also filter for those rows you are specifically targeting. MR output would be a TableOutputFormat for destination table. Map function would be the row key transformer code that transfers the Result's Cell list contents into a Put with just the row key altered for new format while retaining all other columnar data as-is via the above APIs. Alternatively, your destination table can be the same as source, but do also a Delete operation at end of the job/transformation for the older row key copy.

Harsh J · ‎09-08-2016

> Is there a timeline or intentions to update the repo version of kafka to 0.9? Kafka 0.9 has been available for RHEL7 based distributions via http://archive.cloudera.com/kafka/redhat/7/x86_64/kafka/2.0.2/RPMS/noarch/ for ex.. What URL are you currently pointing your Yum kafka repository configuration to? > Will it introduce any problem migrating cdh from packages to parcels at this point? No, and you can follow http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_migrating_packages_to_parcels.html to do this. > Is it just that parcel or will it become a chain of dependencies I have to download and replicate locally in parcel-repo? Usually just a parcel is required.

Harsh J · ‎09-01-2016

Could you tail and check your NameNode log to observe what it prints as a security error when you attempt this request on it? Does your command use the same JVM (with unlimited JCE jars installed if applicable) as the server does?

Harsh J · ‎09-01-2016

As you can note on https://aws.amazon.com/ec2/instance-types/ and http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-lifetime, the m3.xlarge uses 2x "instance store" type disks, which will be entirely destroyed when you stop an instance. When you bring back your instance, it would not have any of its past persisted data, and that's not acceptable to a lot of CM and CDH components. Your blocks on HDFS would no longer be on the disk so they'd be reported as missing too. You should instead use instances that provide "EBS" storage so the data persists. For cloud environment deployments we recommend using Cloudera Director to install, deploy and run your Cloudera CM and CDH cluster instead of manually managing it, to avoid the little problems such as these: https://www.cloudera.com/documentation/director/latest/topics/director_intro.html You can also checkout what instance types are recommended by Cloudera Director for CM and CDH here: https://www.cloudera.com/documentation/director/latest/topics/director_deployment_requirements.html#concept_fhh_ygd_nt_a

Harsh J · ‎08-29-2016

I'd recommend looking for WARN or higher logs with the reference "Checkpoint" in them, to find why it aborts mid-way frequently. There were some timeout associated issues in very early CDH4 period, but I've not seen this issue repeat with CDH5, even for very large fsimages.

Harsh J · ‎08-29-2016

Yes it is safe to delete them while the NameNode is running but leave the most recent file alone as that may be actually in progress. The past ones are leftover files from failed checkpoint operations. Its concerning though that you are observing this though, as it would also mean you may not have a fully done checkpoint yet. What is your CDH version for this HDFS?

Harsh J · ‎08-29-2016

The move by itself would be as trivial as doing an mv/cp across the new disk, while also ensuring the permissions stay intact. In terms of using dedicated disk, the more important requirement is that of the dataLogDir (than the dataDir). ZK calls fsync on the logs written into the dataLogDir which can end up blocking for a long time when there are other processes sharing the disk. You can and should keep the dataDir (where snapshots get stored) separate from the dataLogDir. This way large snapshot writes don't affect the transaction logging performance either. The dataDir location can be on a shared disk as its write is not synchronous. Does this help?

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Concern about Replication when Scheduled NameN...

Re: Oozie Spark Action on Yarn - HADOOP_CONF_DIR o...

Re: Oozie Spark Action on Yarn - HADOOP_CONF_DIR o...

Re: How to update an HBase row key

Re: Kafka version in RHEL/CentOS repo is behind pa...

Re: Unable to access HDFS after enabling kerberos

Re: Problem with starting CDH cluster on AWS using...

Re: fsimage.ckpt_001122334455 files question

Re: fsimage.ckpt_001122334455 files question

Re: Move zookeeper datadir and datalogdir to new d...