About HadoopHero

HadoopHero · ‎01-28-2024

Hi @yagoaparecidoti thanks for the help. I’ve resolved the problem

HadoopHero · ‎01-24-2024

Hi everyone, I'm trying to figure out what causes this Impala warning that is generated when external tables are created: Impala does not have READ_WRITE access to path 'hdfs://xxxxxxxxxxx' At first, we thought it was a permissions issue with Ranger, but after numerous attempts, we still get the same warning. We also checked the permissions on the HDFS paths, but there are folders with permissions of 775, 750, and 755, so there doesn't seem to be a correlation between the warning and POSIX permissions. Could it be an issue with the user and/or group? In many paths, the owner and group of the folder are hdfs:hdfs. Should the owner be changed to impala? Unfortunately, I haven't found any helpful documentation on this topic that would allow me to eliminate the warning. Many users think that the tables are actually not being created and that there is no way to write to them.

HadoopHero · ‎12-13-2023

Hello @James G the only mistake is the lack of display of the tables. Unfortunately, I have no way to execute any command on Impala.

HadoopHero · ‎12-12-2023

Hello everyone, recently on Ranger, we noticed that when enabling users in a security zone, they are unable to see tables in a database through Impala. When connecting via JDBC through the command line or directly from HUE, they can view the tables and their respective data, but if all of this is done through Impala, they have no visibility at all. We have 4 clusters, and all of them have the same type of configuration and permissions, but we encounter the issue in 3 of these clusters. All configurations in Ranger have been checked and appear to be correct in all environments. At this point, I have no idea where to begin the analysis of the problem. Also, there are no access issues reported from Ranger to tables in the security zone, and there are no error logs or access denied logs. I realize that without providing an error log, it's difficult to conduct an analysis, but perhaps some of you have suggestions on where to start investigating the problem. Have any of you ever encountered a similar case? Thanks!!!

HadoopHero · ‎11-13-2023

@DianaTorres I'm sorry but unfortunately the problem still persists even after trying the suggestions in the previous posts

HadoopHero · ‎11-09-2023

Hello Miklos, unfortunately, what you suggested had no effect. We continue to have the same problem, with creating a single parquet file.

HadoopHero · ‎11-07-2023

Hello everyone, my team using TEZ, in particular Hive, has noticed that during an insert with a very simple select a single parquet file of 1.5 gb per partition is generated in the output table. To try to remedy the problem, a number of settings were used at the session level but had no effect. Below are the sets used at the session level: SET hive.execution.engine=tez; SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.optimise.sort.dynamic.partition.threshold=0; --SET tez.grouping.max-size=268435456; --SET hive.exec.reducers.bytes.per.reducer=536870912; --SET tez.grouping.split-count=18; SET hive.vectorized.execution.reduce.enabled = true; SET hive.vectorized.execution.reduce.groupby.enabled = true; --SET hive.tez.auto.reducer.parallelism=false; --SET mapred.reduce.tasks=12; --SET hive.tez.partition.size=104857600; --SET hive.tez.partition.num=10; SET hive.parquet.output.block.size=104857600; I would like to ask if there is a way to always have parquet type files but broken up into smaller files as shown in the image below We cannot understand what the cause might be. Files structured in this way do not guarantee sufficient parallelism for other jobs present (such as sqoop)

HadoopHero · ‎06-14-2023

Hi Yuexin, you have been very helpful. Unfortunately, if I wanted to use "Dynamic Queue Scheduling" in CDP 717 at the moment, I would not have the guarantee to solve any problems via Cloudera support. In fact, it is not recommended to use it in production. Thank you very much

HadoopHero · ‎06-13-2023

Hi, I'm using the Cloudera CDP 7.1.7 Private Cloud Base version and would like to have a confirmation if there is a possibility to be able to set, in the capacity scheduler of Yarn, time rules as I did previously in the old CDH. It would be interesting to be able to use that functionality also in CDP, but in the Cloudera documentation I've searched on the web there's no mention of it. Could you give me some hints if this feature is still present in CDP? Thanks

Online	Offline
Last Visited	‎11-22-2024 10:43 AM

Member Since	‎06-13-2023 01:59 PM
Last Visited	‎11-22-2024 10:43 AM
Posts	9
Kudos received	2

Cloudera Community

Re: Impala does not have READ_WRITE access to path

Impala does not have READ_WRITE access to path

Re: Impossible to see tables from Impala in a Secu...

Impossible to see tables from Impala in a Security...

Re: Possibility Split Parquet file

Re: Possibility Split Parquet file

Possibility Split Parquet file

Re: Time rules on capacity scheduler of YARN

Time rules on capacity scheduler of YARN