Member since
12-11-2015
245
Posts
31
Kudos Received
33
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1121 | 11-19-2025 03:12 PM | |
| 770 | 07-22-2025 07:58 AM | |
| 1685 | 01-02-2025 06:28 AM | |
| 2425 | 08-14-2024 06:24 AM | |
| 4027 | 10-02-2023 06:26 AM |
03-16-2023
03:11 PM
@Me Sorry for that confusion. I see what you mean now Per: https://impala.apache.org/docs/build/html/topics/impala_perf_stats.html#perf_stats_incremental COMPUTE INCREMENTAL STATS In Impala 2.1.0 and higher, you can use the COMPUTE INCREMENTAL STATS and DROP INCREMENTAL STATS commands. The INCREMENTAL clauses work with incremental statistics, a specialized feature for partitioned tables. When you compute incremental statistics for a partitioned table, by default Impala only processes those partitions that do not yet have incremental statistics. By processing only newly added partitions, you can keep statistics up to date without incurring the overhead of reprocessing the entire table each time. So the drop statistics is intended for "COMPUTE INCREMENTAL STATS" and not for " COMPUTE INCREMENTAL STATS with partition" May I know which version of CDP you are using, so that I can test on my end and confirm you.
... View more
03-14-2023
09:14 AM
Hi, This statement in the doc "In cases where new files are added to an existing partition, issue a REFRESH statement for the table, followed by a DROP INCREMENTAL STATS and COMPUTE INCREMENTAL STATS sequence for the changed partition." Applies specifically to a partition in which stats are already available but you added more data to that existing partition. If you are unsure about whether stats exist for a partition you can run show table stats <table_name>; and check the "Incremental stats" section Query: show table stats test_part
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------------------------+
| b | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------------------------+
| 1 | 0 | 1 | 0B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://xxxx:8020/user/hive/warehouse/test_part/b=1 |
| Total | -1 | 1 | 0B | 0B | | | | |
+-------+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------------------------------------+
Fetched 2 row(s) in 5.60s If false, you can run COMPUTE INCREMENTAL STATS with PARTITION If true and you have added more data to this partition then you have to drop the stats and then run COMPUTE INCREMENTAL STATS with PARTITION
... View more
02-22-2023
12:12 PM
Hi. Yeah its expected when you have the common path for tgt cache for multiple user. Can you make the location unique for each different user - I haven't tested but I see an option in this link https://gpdb.docs.pivotal.io/6-3/admin_guide/kerberos-win-client.html Set up the Kerberos credential cache file. On the Windows system, set the environment variable KRB5CCNAME to specify the file system location of the cache file. The file must be named krb5cache. This location identifies a file, not a directory, and should be unique to each login on the server. When you set KRB5CCNAME, you can specify the value in either a local user environment or within a session. For example, the following command sets KRB5CCNAME in the session: set KRB5CCNAME=%USERPROFILE%\krb5cache
... View more
01-09-2023
12:20 PM
2 Kudos
You can set quota on /tmp - Once quota is reached further write on the directory will fail. https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-set-quotas-cm.html has the steps to enable quota
... View more
04-01-2020
11:19 PM
Hi @Amn_468 Please configure it in CM > HDFS > Configuration >
Java Heap Size of NameNode in Bytes
Enter a value per requirement
Save and Restart
... View more
03-23-2020
03:59 AM
"although same property (dfs.datanode.balance.max.concurrent.moves) already exists in Cloudera Manager." --> Okay, I assume you are referring to the one highlighted in screenshot below Yes its unnecessary to add dfs.datanode.balance.max.concurrent.moves in Balancer Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml if you had used the "Maximum Concurrent Moves" section. Also note that this "Maximum Concurrent Moves" is scoped only to balancer and not to datanodes. So for datanodes you have to explicitly set it using " DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" Regarding reason for why to add this property both for balancer and datanode is mentioned in my previous comment. Hope that clarifies and let me know if there are further questions I will raise an internal jira for correcting the document to avoid duplicate entry on balancer safety-valve.
... View more
03-22-2020
11:32 PM
Yes you can install CM offline after downloading the packages and - Its documented in this link https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/cm_ig_create_local_package_repo.html#internal_package_repo Once the repo is ready you can install the binaries using the steps in link https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/install_cloudera_packages.html#id_z2h_pnm_25
... View more
03-22-2020
08:48 PM
You would need to tune your heap in accordance with the number of files. The tuning guideline is in document https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html If you would like to get count of files, You may run hdfs dfs -count /
... View more
03-22-2020
08:16 PM
Just a correction The document suggest to tune property dfs.datanode.balance.max.concurrent.moves and not dfs.datanode.ec.reconstruction.xmits.weight Regarding the question of dfs.datanode.balance.max.concurrent.moves is already present in Datanode and balancer so why to add again. The doc says "Add the following code to the configuration field, for example, setting the value to 50." i.e 50 is just a example number and the document doesnt mandate setting this value to 50. You can tune it to any value of your requirement. Then why to add in both balancer and datanode? Setting it on HDFS Balancer(client) will give the flexibility to change this value on the client side at runtime i.e you can set this property to a value lesser or equal to what you have configured on the datanode side. Reason why we set this on server side is to impose a limit till what value the property can be configured. If you configure a value greater than what you have set on the Datanode(server), the datanodes fails it
... View more
03-22-2020
06:32 AM
The error suggests the DFSClient is unable to read the blocks due to connection failure. Either the ports are blocked or unreachable from the node From the node in which you are running the code snippet/From the node in which the executor ran, try reading the file using hdfs commands in debug mode which can give further clues on what node/service the client was trying to reach prior to connect timeout export HADOOP_ROOT_LOGGER=DEBUG,console
hdfs dfs -cat hdfs://ec2-18-234-71-106.compute-1.amazonaws.com:8020/dataset/Tech.csv
... View more