About Harsh J

Harsh J · ‎01-09-2020

Can you paste the contents of the all files in the following directory from the Ranger host please? /var/run/cloudera-scm-agent/process/1546333400-ranger-RANGER_ADMIN-SetupRangerCommand/logs/* The missing property (db_password) is written by a control script that should log some information to these files and it'll help us determine a cause if we had their contents. I'm assuming that in your CM - Ranger - Configuration page the value for field 'ranger.jpa.jdbc.password' is set to a valid value. Also, do you perhaps have an @ (at) character in your password? If yes, could you try a different password without that character? You may be hitting a bug (OPSAPS-53645 is its internal ID, fixed in future releases) that did not support that password character in the original CDP 7.0 release.

Harsh J · ‎07-09-2019

Yes that is correct, and the motivations/steps-to-use are reflected here too: https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_s3guard.html Note: On your point of 'load data from S3 into HDFS', it is better stated as simply 'read data from S3', where HDFS gets used as a transient storage (where/when required). There does not need to be a 'download X GiB data from S3 to HDFS first, only then begin jobs' step, as distributed jobs can read off of S3 via s3a:// URLs in the same way they do from HDFS hdfs://.

Harsh J · ‎07-04-2019

Try deleting away /etc/default/cloudera-*, /etc/cloudera-*, /var/lib/cloudera-* entirely, and erase all cloudera-* packages via yum (on all involved hosts). After this, attempt the installer again. This will allow the default embedded configs to be written and used for DB initialization, vs. preserving whatever has been left over.

Harsh J · ‎06-23-2019

This looks like a case of edit logs getting reordered. As @bgooley noted, it is similar to HDFS-12369, where the OP_CLOSE is appearing after OP_DELETE causing the file to be absent when replaying the edits. The simplest fix, depending on if this is the only file instance of the reordered issue in your edit logs, would be to run the NameNode manually in an edits-recovery mode and "skip" this edit when it catches the error. The rest of the edits should apply normally and let you start up your NameNode. The recovery mode of NameNode is detailed at https://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/ If you're using CM, you'll need to use the NameNode's most recent generated configuration directory under /var/run/cloudera-scm-agent/process/ on the NameNode host as the HADOOP_CONF_DIR, while logged in as 'hdfs' user, before invoking the manual NameNode startup command. Once you've followed the prompts and the NameNode appears to start up, quit out/kill it to restart from Cloudera Manager normally. If you have a Support subscription, I'd recommend filing a case for this, as the process could get more involved depending on how widespread this issue is.

Harsh J · ‎06-18-2019

It could be passed by either modes, hence the request for the CLI used. The property to modify on the client configuration (via CM properties or via -D early CLI args) is called 'mapreduce.map.memory.mb', and the administrative limit is defined in the Resource Manager daemon configuration via 'yarn.scheduler.maximum-allocation-mb'

Harsh J · ‎06-18-2019

Please share your full Sqoop CLI. The error you are receiving suggests that the configuration passed to this specific Sqoop job carried a parameter asking for Map memory to be higher than what the administrator has configured as a limit a Map task may request. As a result, the container request is rejected. Lowering the request memory size of map tasks will let it pass through this check.

Harsh J · ‎05-23-2019

For HBase MOBs, this can serve as a good starting point as most of the changes are administrative and the writer API remains the same as regular cells: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hbase_mob.html For SequenceFiles, a good short snippet can be found here: https://github.com/sakserv/sequencefile-examples/blob/master/test/main/java/com/github/sakserv/sequencefile/SequenceFileTest.java#L65-L70 and for Parquet: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/example/ExampleParquetWriter.java More general reading for the file formats: https://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ and https://parquet.apache.org/documentation/latest/

Harsh J · ‎05-20-2019

You can apply the queries directly on that external table. Hive will use HDFS for any transient storage it requires as part of the query stages. Of course, if it is a set of queries overall, you can also store all the intermediate temporary tables on HDFS in the way you describe, but the point am trying to make is that you do not need to copy the original data as-is, just allow Hive to read off of S3/write into S3 at the points that matter.

Harsh J · ‎05-19-2019

You can do this via two methods: Container files, or HBase MOBs. Which is the right path depends on your eventual, dominant read pattern for this data. If your analysis will require loading up only a small range of images out of the total dataset, or individual images, then HBase is a better fit with its key based access model, columnar storage and caches. If instead you will require processing these images in bulk, then large container files (such as Sequence Files (with BytesWritable or equivalent), Parquet Files (with BINARY/BYTE_ARRAY types), etc. that can store multiple images into a single file, and allow for fast, sequential reads of all images in bulk.

Harsh J · ‎05-19-2019

Would you be able to attach the contents of /tmp/scm_prepare_node.vQZe0yDf/scm_prepare_node.log (or any/all '/tmp/**/scm_prepare_node.log' files) from the host the install failed on (node5 in this case)?

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Adding ranger in CDP

Re: S3Guard Suggested to help fix Consistency

Re: Cloudera manager embedded database fails to co...

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Efficient ways to store many images files

Re: Unable to complete Add Host Wizard on 6.2 Mana...