Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2244 | 07-09-2019 12:53 AM | |
| 12768 | 06-23-2019 08:37 PM | |
| 9834 | 06-18-2019 11:28 PM | |
| 10824 | 05-23-2019 08:46 PM | |
| 5095 | 05-20-2019 01:14 AM |
01-09-2020
01:35 AM
Hi Harsh, I was able to add ranger in CDP after going through cloudera documentation. I added ranger using Postgres DB, earlier i was trying it using Mysql DB. So issues is resolved for me. Thanks
... View more
11-19-2019
06:23 PM
It happened to me when I was installing cloudera 6.3.1, What solved to me was: 1. run: sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config 2. config /etc/hosts: (just an exemple, set the host of all machines) hostnamectl set-hostname master1.hadoop-test.com
echo "10.99.0.175 master1.hadoop-test.com master1" >> /etc/hosts
sed -i 's/\r//' /etc/hosts
echo "HOSTNAME=master1.hadoop-test.com" >> /etc/sysconfig/network 3. reboot then: 4. wget <a href="https://archive.cloudera.com/cm6/6.3.1/cloudera-manager-installer.bin" target="_blank">https://archive.cloudera.com/cm6/6.3.1/cloudera-manager-installer.bin</a> 5. chmod u+x cloudera-manager-installer.bin 6. ./cloudera-manager-installer.bin
... View more
09-25-2019
03:03 AM
Hi Harsha, Thanks for the explanation. In extension to the topic, I need small clarification - we recently implemented sentry on impala, based on below KB [1] article, we can't execute "Invalidate all metadata and rebuild index" and "Perform incremental metadata update" , since we don't have access to all the DB's, it's fair as well. Now my question is - 1. I am not able to see new DB in Hue impala, I can see the same from beeline or impala shell. How to fix or solve this ? 2. I can execute invalidate metadata on table from impala shell but I have 50+ DB's and 10's of tables in each db. Is there any option to run invalidate metadata ion DB level instead of individual table? [1] https://my.cloudera.com/knowledge/INVALIDATE-METADATA--Sentry-Enabled--ERROR? id=71141 Thanks Krishna
... View more
07-09-2019
12:53 AM
2 Kudos
Yes that is correct, and the motivations/steps-to-use are reflected here too: https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_s3guard.html Note: On your point of 'load data from S3 into HDFS', it is better stated as simply 'read data from S3', where HDFS gets used as a transient storage (where/when required). There does not need to be a 'download X GiB data from S3 to HDFS first, only then begin jobs' step, as distributed jobs can read off of S3 via s3a:// URLs in the same way they do from HDFS hdfs://.
... View more
06-23-2019
08:37 PM
1 Kudo
This looks like a case of edit logs getting reordered. As @bgooley noted, it is similar to HDFS-12369, where the OP_CLOSE is appearing after OP_DELETE causing the file to be absent when replaying the edits. The simplest fix, depending on if this is the only file instance of the reordered issue in your edit logs, would be to run the NameNode manually in an edits-recovery mode and "skip" this edit when it catches the error. The rest of the edits should apply normally and let you start up your NameNode. The recovery mode of NameNode is detailed at https://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/ If you're using CM, you'll need to use the NameNode's most recent generated configuration directory under /var/run/cloudera-scm-agent/process/ on the NameNode host as the HADOOP_CONF_DIR, while logged in as 'hdfs' user, before invoking the manual NameNode startup command. Once you've followed the prompts and the NameNode appears to start up, quit out/kill it to restart from Cloudera Manager normally. If you have a Support subscription, I'd recommend filing a case for this, as the process could get more involved depending on how widespread this issue is.
... View more
06-03-2019
12:27 AM
1 Kudo
Hello Harsh, Thank you for the help on this. I was able to identify some information that helped here. Will come back in case need further help. Will accept your reply as Solution. 🙂 Thanks snm1523
... View more
05-23-2019
08:46 PM
1 Kudo
For HBase MOBs, this can serve as a good starting point as most of the changes are administrative and the writer API remains the same as regular cells: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hbase_mob.html For SequenceFiles, a good short snippet can be found here: https://github.com/sakserv/sequencefile-examples/blob/master/test/main/java/com/github/sakserv/sequencefile/SequenceFileTest.java#L65-L70 and for Parquet: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/example/ExampleParquetWriter.java More general reading for the file formats: https://blog.cloudera.com/blog/2011/01/hadoop-io-sequence-map-set-array-bloommap-files/ and https://parquet.apache.org/documentation/latest/
... View more
05-21-2019
03:30 AM
Mr. Harsh, would you please have a look at my reply, please? thanks
... View more
05-20-2019
04:18 AM
Thanks for this. I think, we can summarize this as follows: * If only External Hive Table is used to process S3 data, the technical issues regarding consistency, scalable meta-data handling would be resolved. * If External & Internal Hive Tables are used in combination to process S3 data, the technical issues regarding consistency, scalable meta-data handling and data locality would be resolved. * If Spark alone is used on top of S3, the technical issues regarding consistency with (in memory processing), scalable meta-data handling would be resolved. As Spark will perform transient storage in memory and only read the initial data from S3 and write back the result.
... View more