Member since
10-01-2018
274
Posts
6
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
188 | 11-27-2024 12:50 PM | |
3674 | 09-28-2020 08:05 AM | |
3260 | 04-16-2020 09:20 AM | |
1616 | 04-16-2020 08:48 AM | |
4228 | 04-16-2020 08:10 AM |
11-28-2024
08:07 AM
1 Kudo
Hi Ryan, thank you for your quick reply. I'll try the proposed solution, at least because it seems to be the only one to solve the situation. Thanks again
... View more
06-01-2022
12:06 PM
@yagoaparecidoti If you notice that the high memory utilization moves from tablet server to tablet server, then another candidate is a problem with the schema of one or more tables. The specific symptom I am thinking of is that the tablet size may be too high due to too few partitions; this can drive high memory utilization because the of the amount of information that must be loaded into memory. The fast way to determine whether this is the case is to look at the Charts for the Tablet Server with high memory utilization, and check the size of data across all tablets on that server. Then, divide this number by the number of replicas on the server, and this gives us an avg value of the replica size. It is perfectly possible for this problem to originate with a single table, so if you have one or more tables you know have few partitions, I recommend checking that table specifically. Beyond this will require log analysis, and is better suited to a support case.
... View more
08-23-2021
09:24 PM
Context As of CDP 7.1.2, Sentry is deprecated with Kudu, and Ranger becomes the solution for Fine-Grained Authorization in Kudu. The question this article is meant to address is what does it mean for the Impala/Kudu stack to enable Ranger for Kudu, especially if Impala is already using Ranger? This is not obvious up front because the information is spread over three different sets of documents. Fundamentally the answer is nothing. The reason for this is straightforward: enabling the Kudu module in Ranger should automatically configure the --trusted_user_acl flag in Kudu to include Impala (if installed) and Hive (if installed). What this flag does is exempt the listed users from being checked against Kudu's authorization model, which in CDP installations mean Ranger policies. So, the correct way to enforce access to Impala and Hive tables, which are stored as Kudu, is the Hadoop-SQL policies set for Impala and Hive. By default, enabling Ranger for Kudu should have no impact on your Impala or Hive operations; any further changes you want to make to authorization should be done in the Hadoop-SQL policies. What is the motivation for using Ranger with Kudu, if that is the case? These tables are all still accessible at the Kudu level by users, and changes made from the Kudu level can cause inconsistencies or data loss. Enabling Ranger at the Kudu level prevents this. There are two other use-cases where policies are set at the Kudu level in cm_kudu: these are Spark and NiFi. These services are on a by-user basis instead of having a service user communicate with Kudu; so the completely normal logic of writing Ranger policies applies. Please see the documentation of the respective service for more information.
... View more
06-25-2021
01:14 AM
Mcafee also lists launch_container.sh under vulnerability which can be ignored as launching containers is part of a yarn architecture. Please be kind to go through the below point to understand more about yarn: 1. Client/user submits a job request. 2. Resource manager will check the input splits and will throw an error if it cannot be computed. 3. On successful computation, the resource manager will call the job to submit the procedure. 4. It will then find a node manager where it can launch app master 5. App master process will check the input splits and will make mapper and reducer tasks(Task IDs are given at this point) 6. App master will do the computation and based on that it will request resources in form of a container that will contain memory and CPU as requested by the App master. 7. On receiving the request, the node manager will use "launch_container.sh" as well as a few other job-related inputs to successfully launch a container to be used for task execution. This is the expected behavior and should be considered a false vulnerability.
... View more
06-20-2021
09:21 PM
The Qualys tool reports vulnerabilities in ZooKeeper, even when the ZooKeeper security configuration is applied (HDP doc, CDP doc).
There are two kinds of reports Qualys makes that are not addressable by Cloudera:
The security guidance keeps several znodes in the affected services as world-readable, like so: /zookeeper/quota sasl:zookeeper:cdrwa,world:anyone:r Any world-readable znode will appear on the scanner, and will require an exception filed for it. This is the position of Qualys, as reported by our customers who use Qualys.
The security guidance does not cover several services. The following components require no action according to our documentation:
Calcite
Knox
MapReduce
Spark
Tez
Zeppelin
The Qualys tool will still report znodes owned by these services.
Note: it is possible to harden the ACLs beyond the Best Practices recommendation in the documentation, and to harden the ACLs of services not covered in the Best Practices. However, Cloudera cannot provide what the correct ACLs are in that case. Testing on the customer's side is required. It is very easy to set the ACLs such that services that need access to the znode will not have it, and this needs to be handled on a znode by znode basis.
If an attempt at hardening the ACLs is going to be made, these suggestions may help:
Try implementing SASL (this is the same method used in most of the Best Practices recommendations)
Try restricting privileges to the service user
If something breaks, try to identify the user that performed the failed action, and add the necessary privileges only for that user
... View more
06-01-2021
08:10 PM
Note: Cloudera does not support Antivirus software of any kind.
This article contains recommendations for excluding CDH components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude which are unique to them. These details will be addressed in individual articles dedicated to the service in question.
The three primary locations you will want to exclude from antivirus are:
Data directories: These can be very large, and therefore, take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes.
Log directories: These are write-heavy.
Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes.
In general, consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera /etc/<component> /var/lib/<component>
Runtime and Logging
/var/run/<component> /var/log/<component>
Scratch and Temp
/var/tmp/<component> /tmp/<component>
Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories.
Example: cloudera-scm-server and cloudera-scm-agent.
Across CDH services, there are also many user-configurable locations. Most of these can be found in Cloudera Manager properties with names like service.scratch.dir and service.data.dir; go to CM > Service > Configurations and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow:
Cloudera Manager:
Note: Cloudera Manager has a special requirement in the form of a user-configurable database. I recommend you exclude this database. However, the details of this database are set on installation; the database may be colocated with cloudera-scm-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored.
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera/cm /opt/cloudera/cm-agent
/etc/cloudera-scm-agent
/etc/cloudera-scm-server
/var/lib/cloudera-host-monitor /var/lib/cloudera-scm-agent /var/lib/cloudera-scm-eventserver /var/lib/cloudera-scm-server /var/lib/cloudera-scm-server-db /var/lib/cloudera-service-monitor
Runtime and Logging
/var/run/cloudera-scm-agent
/var/run/cloudera-scm-server
/var/log/cloudera-scm-agent /var/log/cloudera-scm-alertpublisher /var/log/cloudera-scm-eventserver /var/log/cloudera-scm-firehose /var/log/cloudera-scm-server
HDFS:
Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the hdfs-site.xml file:
# grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera
/var/lib/hadoop-hdfs
Runtime and Logging
/var/log/hadoop-hdfs
Scratch and Temp
/tmp/hadoop-hdfs
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components.
YARN:
Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > YARN > Configuration:
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir
yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera
/var/lib/hadoop-yarn
Runtime and Logging
/var/log/hadoop-yarn
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
MapReduce:
Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > MapReduce > Configs, and this one, in particular, should be excluded:
mapreduce.jobhistory.recovery.store.leveldb.path
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera
/var/lib/hadoop-mapreduce
Runtime and Logging
/var/log/hadoop-mapreduce
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
ZooKeeper:
Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command:
# grep dataDir /etc/zookeeper/conf/zoo.cfg
Consider excluding the following directories and all of their subdirectories:
Installation, Configuration, and Libraries
/opt/cloudera
Runtime and Logging
/var/log/zookeeper
Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
06-01-2021
08:02 PM
Note: Cloudera does not support Antivirus software of any kind. This article contains recommendations for excluding CDP components and directories from AV scans and monitoring. It is important to note that these recommendations do not apply to each service, and further, some services will have additional items to exclude that are unique to them. These details will be addressed in individual articles dedicated to the service in question. The three primary locations you will want to exclude from Antivirus are: Data directories: These can be very large, and therefore, take a long time to scan; they can also be very write-heavy, and therefore suffer performance impacts or failures if the AV holds up writes. Log directories: These are write-heavy. Scratch directories: These are internal locations used by some services for writing temporary data, and can also cause performance impacts or failures if the AV holds up writes. In general, consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera /etc/<component> /var/lib/<component> Runtime and Logging /var/run/<component> /var/log/<component> Scratch and Temp /var/tmp/<component> /tmp/<component> Note: The <component> does not only refer to the service name, as a given service may have multiple daemons with their own directories. Example: cloudera-scm-agent and cloudera-scm-server Across CDP services, there are also many user-configurable locations. Most of these can be found in Cloudera Manager properties with names like "service.scratch.dir" and "service.data.dir"; go to Cloudera Manager > Service > Configuration and search for any property containing "dir", all of which may be considered for exclusion. Instructions for specific services follow: Cloudera Manager: Note: Cloudera Manager has a special requirement in the form of a user-configurable database. I recommend you to exclude this database. However, the details of this database are set on installation; the database may be co-located with cloudera-scm-server, or on a remote host. Consult with your database administrators for details on the path where the database information is stored. Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera/cm /opt/cloudera/cm-agent
/etc/cloudera-scm-agent
/etc/cloudera-scm-server
/var/lib/cloudera-host-monitor /var/lib/cloudera-scm-agent /var/lib/cloudera-scm-eventserver /var/lib/cloudera-scm-server /var/lib/cloudera-scm-server-db /var/lib/cloudera-service-monitor Runtime and Logging /var/run/cloudera-scm-agent
/var/run/cloudera-scm-server /var/run/cloudera-scm-server-db
/var/log/cloudera-scm-agent /var/log/cloudera-scm-alertpublisher /var/log/cloudera-scm-eventserver /var/log/cloudera-scm-firehose /var/log/cloudera-scm-server HDFS: Note: The directories in HDFS are user-configurable. I recommend you exclude these, especially the data directory for the DataNode and the meta directories for the NameNode and JournalNode. These details can be found in the hdfs-site.xml file: # grep -A1 "dir" /etc/hadoop/conf/hdfs-site.xml Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera
/var/lib/hadoop-hdfs Runtime and Logging /var/log/hadoop-hdfs Scratch and Temp /tmp/hadoop-hdfs Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to exclude the other components. YARN: Note: The directories YARN uses are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > YARN > Configuration: yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.nodemanager.recovery.dir
yarn.timeline-service.leveldb-state-store.path
yarn.timeline-service.leveldb-timeline-store.path Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera
/var/lib/hadoop-yarn Runtime and Logging /var/log/hadoop-yarn Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components. MapReduce: Note: Some directories in MapReduce are user-configurable. I recommend you exclude them. These properties can be found in Cloudera Manager > MapReduce > Configs, and this one, in particular, should be excluded: mapreduce.jobhistory.recovery.store.leveldb.path Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera
/var/lib/hadoop-mapreduce Runtime and Logging /var/log/hadoop-mapreduce Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components. ZooKeeper: Note: ZooKeeper has a user-configurable data directory. I recommend you exclude it. This directory can be found by running the following command: # grep dataDir /etc/zookeeper/conf/zoo.cfg Consider excluding the following directories and all of their subdirectories: Installation, Configuration, and Libraries /opt/cloudera Runtime and Logging /var/log/zookeeper Note: HDFS, YARN, MapReduce, and ZooKeeper are mutually interdependent and you are likely to experience unsatisfactory results if you fail to also exclude the other components.
... View more
10-02-2020
06:15 AM
Hello @rblough First of all thank you so much for your response..!! 1. The X and Y environments are having different HBase versions. 2. HBase X version is 1.3.1 and HBase Y Version is 1.0.0 3. We are not clearing the table but following below steps, echo "disable '$1'" | hbase shell echo "restore_snapshot '$1_SNAPSHOT_$DATE'" | hbase shell echo "enable '$1'" | hbase shell Daily we are doing the same but sometimes we are seeing the counts mismatch. So, we are doing truncating and restoring the same snapshots and working fine in Y environment. Please help us your inputs. Thanks & Regards, Vinod
... View more
09-30-2020
03:51 AM
@rbloughThank you for the continued support. 2) The command is being run as the hdfs user. 1) The detailed output showed that there are 603,723 blocks in total. Looking at the HDFS UI, the Datanodes report having 586,426 blocks each. 3) hdfs fsck / -openforwrite says that there are 506,549 blocks in total. The discrepancy in block count seems to be there still. Below are the summaries of the different fsck outputs. hdfs fsck / -files -blocks -locations -includeSnapshots Status: HEALTHY Number of data-nodes: 3 Number of racks: 1 Total dirs: 64389 Total symlinks: 0 Replicated Blocks: Total size: 330079817503 B (Total open files size: 235302 B) Total files: 625308 (Files currently being written: 129) Total blocks (validated): 603723 (avg. block size 546740 B) (Total open file blocks (not validated): 122) Minimally replicated blocks: 603723 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Blocks queued for replication: 0 Erasure Coded Block Groups: Total size: 0 B Total files: 0 Total block groups (validated): 0 Minimally erasure-coded block groups: 0 Over-erasure-coded block groups: 0 Under-erasure-coded block groups: 0 Unsatisfactory placement block groups: 0 Average block group size: 0.0 Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 Blocks queued for replication: 0 FSCK ended at Wed Sep 30 12:23:06 CEST 2020 in 23305 milliseconds hdfs fsck / -openforwrite Status: HEALTHY Number of data-nodes: 3 Number of racks: 1 Total dirs: 63922 Total symlinks: 0 Replicated Blocks: Total size: 329765860325 B Total files: 528144 Total blocks (validated): 506549 (avg. block size 651004 B) Minimally replicated blocks: 506427 (99.975914 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9992774 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Blocks queued for replication: 0 Erasure Coded Block Groups: Total size: 0 B Total files: 0 Total block groups (validated): 0 Minimally erasure-coded block groups: 0 Over-erasure-coded block groups: 0 Under-erasure-coded block groups: 0 Unsatisfactory placement block groups: 0 Average block group size: 0.0 Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 Blocks queued for replication: 0 FSCK ended at Wed Sep 30 12:28:06 CEST 2020 in 11227 milliseconds
... View more
09-29-2020
01:12 PM
I used the systemd. It worked great. Again, I appreciate everyone's help!
... View more