Member since
02-09-2015
95
Posts
8
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5497 | 08-23-2021 04:07 PM | |
1472 | 06-30-2021 07:34 AM | |
1769 | 06-30-2021 07:26 AM | |
14132 | 05-17-2019 10:27 PM | |
3108 | 04-08-2019 01:00 PM |
08-31-2021
01:58 PM
Hi Apache spark will initiate connection to your db on that port only via jdbc , so you can open a firewall where sources are your nodes ips and destination is your db server ip on the port you specified. Best Regards
... View more
08-31-2021
01:52 PM
Hi, do you have apache ranger installed ? if yes, check that the right policies are added under yarn service and the ranger user sync service is configured and syncing AD users and groups. Best Regards
... View more
08-31-2021
01:31 PM
Hi, can you post the error please? also cluld you please clarify the below : is your cluster having kerberos enabled? also did you enable hdfs extension for druid? whats the data type you are trying to read from hdfs ? Best Regards
... View more
08-31-2021
01:15 PM
1 Kudo
Hi, With Hadoop 3, there is intra node balance as well as the data nodes balance which can help you distribute and balance the data on your nodes cluster. for sure the recommended way is having all data nodes with same number of disks and size, but its is possible to have different config for data nodes but you will need to keep balancing your data nodes quite often which will take computation and network resources. Also another thing to consider when you have disks with different size is "data node volume choosing policy" which is by default set to round robin , you need to consider choosing available space instead. i suggest you to read this article from Cloudera as well. https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ Best Regards
... View more
08-23-2021
04:07 PM
Hi, can you use beeline and type the below command then recreate the table : set parquet.column.index.access=false; this should make hive not use the index of your create table statement to map the data in your files, but instead it will use the columns names . hope this works for you. Best Regards
... View more
06-30-2021
11:43 PM
You can replace the sentry part of your script with Apache ranger API to create/update/delete ranger policies, example here Ranger RestAPIs for Creating, Updating, Deleting, and Searching Policies in Big SQL - Hadoop Dev (ibm.com)
... View more
06-30-2021
07:34 AM
Make sure that you are using the oracle jdbc driver version which is compatible with the oracle db version you are connecting to
... View more
06-30-2021
07:26 AM
You can check Kafka Mirror Maker here Set up MirrorMaker in Cloudera Manager , also if the 2 clusters are secured via kerberos and reside in 2 different realms you need to make sure there's trust between these 2 kerberos realms
... View more
06-30-2021
07:19 AM
I assume you are using Capacity scheduler not fair scheduler. that's why queues wont take available resources from other queues, you can read more regarding that here Comparison of Fair Scheduler with Capacity Scheduler | CDP Public Cloud (cloudera.com) .
... View more
06-07-2021
11:09 PM
Following are the configurations for connecting Apache Ranger with LDAP/LDAPS. There's an important tool that will help identify some settings in your AD AD Explorer - Windows Sysinternals | Microsoft Docs.
This configuration will sync LDAP users and link them with their LDAP groups every 12 hours, so later from Apache Ranger, you can give permission based on LDAP groups as well.
For connecting using LDAPS, ensure you have the proper certificates added in the same server that contains the Ranger's UserSync service.
Configuration Name
Configuration Value
Comment
ranger.usersync.source.impl.class
org.apache.ranger.ldapusersync.process.LdapUserGroupBuilder
ranger.usersync.sleeptimeinmillisbetweensynccycle
12 hour
ranger.usersync.ldap.url
ldaps://myldapserver.example.com
ldaps or ldap based on your LDAP security
ranger.usersync.ldap.binddn
myuser@example.com
ranger.usersync.ldap.ldapbindpassword
mypassword
ranger.usersync.ldap.searchBase
OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync
ranger.usersync.ldap.user.searchbase
OU=hadoop2,DC=example,DC=com;OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync, you can also add 2 OU and separate them with ;
ranger.usersync.ldap.user.objectclass
user
double-check the same
ranger.usersync.ldap.user.searchfilter
(memberOf=CN=HADOOP_ACCESS,DC=example,DC=com)
if you want to filter specific users to be synced in Ranger and not your entire AD
ranger.usersync.ldap.user.nameattribute
sAMAccountName
double-check the same
ranger.usersync.ldap.user.groupnameattribute
memberOf
double check the same
ranger.usersync.user.searchenabled
true
ranger.usersync.group.searchbase
OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync
ranger.usersync.group.objectclass
group
double-check the same
ranger.usersync.group.searchfilter
(cn=hadoop_*)
if you want to sync specific groups not all AD groups
ranger.usersync.group.nameattribute
cn
double-check the same
ranger.usersync.group.memberattributename
member
double-check the same
ranger.usersync.group.search.first.enabled
true
ranger.usersync.truststore.file
/path/to/truststore-file
ranger.usersync.truststore.password
TRUST_STORE_PASSWORD
Here is a helpful link on how to construct complex LDAP search queries. Search Filter Syntax - Win32 apps | Microsoft Docs
Disclaimer from Cloudera: This article is contributed by an external user. Steps/ Content may not be technically verified by Cloudera and may not be applicable for all use cases and specifically to a particular distribution. Follow with caution and own risk. If needed, raise a support case to get the confirmation.
... View more