Member since
01-19-2017
3681
Posts
633
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1631 | 06-04-2025 11:36 PM | |
| 2087 | 03-23-2025 05:23 AM | |
| 989 | 03-17-2025 10:18 AM | |
| 3767 | 03-05-2025 01:34 PM | |
| 2591 | 03-03-2025 01:09 PM |
01-12-2021
08:46 PM
If we want to limit interaction of hdp/hadoop developers/data analyst or scientist, does it mean we don't need to install client in all workernodes? And we have ever found that for special case, sqoop and oozie client, are needed to be installed in all nodes include master-worker nodes, Is it related to how sqoop and oozie works?
... View more
01-05-2021
11:50 PM
@GangWar thank you so much for your help I assigned myself as "Power User" and it worked like charm. However I'm bit surprised as my user is admin user still I had to assign a power user role.
... View more
01-04-2021
09:47 AM
@Mondi The simple answer is YES and the best source is the vendor itself Rack awareness CDP as computations are performed with the assistance of rack awareness scripts. Hope that helps Was your question answered? If so make sure to mark the answer as the accepted solution. If you find a reply useful, Kudos this answer by hitting the thumbs up button.
... View more
01-03-2021
12:17 PM
@PauloNeves Yes, the command show databases will list all databases in a Hive instance whether you are authorized to access it or not. I am sure this is cluster devoid of Ranger or Sentry which are the 2 authorization tools in Cloudera!!! Once the ranger plugin is enabled then authorization is delegated to Ranger to provide fine-grained data access control in Hive, including row-level filtering and column-level masking. This is the recommended setting to make your database administration easier as it provides a centralized security administration, access control, and detailed auditing for user access within the Hadoop, Hive, HBase, and other components in the ecosystem. Unfortunately, I had already enabled the Ranger plugin for hive on my cluster but all the same, it confirms what I wrote above. Once the ranger plugin is enabled for a component ie. hive,HBase or Kafka then the authorization is managed exclusively through Ranger Database listing before Ranger Below is what happens if my user sheltong has not explicitly been given authorization through Ranger, see [screenshots] I see no database though I have over 8 databases See the output of the hive user who has explicit access to all the tables due to the default policy he could see the databases. Database listing after Ranger After creating a policy explicitly giving the user sheltong access to the 3 databases Policy granting explicit access to 3 databases Now when I re-run the show databases bingo! Back to your question show tables from forbidden_db, it returns an empty list, this can be true especially if the database is empty! has not table like the screenshot below though I have access to the database it's empty Now I create a table and re-run the select now I am able to see the table I hope this demonstrates the power of Ranger and explains maybe what you are encountering, I am also thinking if your cluster has Ranger hive plugin enabled you could have select on the databases but you will need explicit minimum select or the following permission on the underlying database tables to be able to see them. Happy Hadooping
... View more
01-01-2021
05:26 PM
Hi@Shelton Thank you for the response. I got it. It helped. Best regards,
... View more
12-07-2020
08:35 PM
Hey Mr @Shelton The problem has been fixed. I am using this tutorial https://support.imply.io/hc/en-us/articles/360025589574-Connecting-Tableau-to-Druid-with-JDBC with added some property in druid
... View more
11-11-2020
02:06 AM
Hello @Amn_468 Since you reported the DN Pause time, I spoke/referred about DN heap only. The block counts on most of the DN seems >6Millions, hence would suggest to increase the DN heap to 8GB (from current value of 6GB) and perorm a rolling restart to bring the new heap size into effect. There is no straight forward way to say you hit the small file problem but if your average block size is few MB or less than a MB in size, it is an indication that you are storing/accumulating small files in HDFS. Simplest way to determine small files in cluster is to run fsck. Fsck should show the average block size. If it's too low a value (eg ~ 1MB ), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] You may refer belwo links for your help on dealing with small files. - https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/ - https://community.cloudera.com/t5/Community-Articles/Identify-where-most-of-the-small-file-are-located-in-a-large/ta-p/247253
... View more
11-07-2020
07:15 AM
Hello @Shelton, I have a new problem and was wondering if you could help me out. https://community.cloudera.com/t5/Support-Questions/Process-Stuck-in-Hadoop-Cluster/td-p/305553 I'm trying to run a process and the yarn.nodemanager log get stuck in the following lines: 2020-11-07 04:19:34,342 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042
2020-11-07 04:19:34,347 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /138.68.238.32:8031
2020-11-07 04:19:34,368 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2020-11-07 04:19:34,373 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2020-11-07 04:19:34,520 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id 1152592273
2020-11-07 04:19:34,523 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1064351767
2020-11-07 04:19:34,524 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as slave01:44367 with total resource of <memory:28672, vCores:6>
2020-11-07 04:19:34,524 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
... View more