About tarekabouzeid91

tarekabouzeid91 · ‎08-31-2021

Hi Apache spark will initiate connection to your db on that port only via jdbc , so you can open a firewall where sources are your nodes ips and destination is your db server ip on the port you specified. Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, do you have apache ranger installed ? if yes, check that the right policies are added under yarn service and the ranger user sync service is configured and syncing AD users and groups. Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, can you post the error please? also cluld you please clarify the below : is your cluster having kerberos enabled? also did you enable hdfs extension for druid? whats the data type you are trying to read from hdfs ? Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, With Hadoop 3, there is intra node balance as well as the data nodes balance which can help you distribute and balance the data on your nodes cluster. for sure the recommended way is having all data nodes with same number of disks and size, but its is possible to have different config for data nodes but you will need to keep balancing your data nodes quite often which will take computation and network resources. Also another thing to consider when you have disks with different size is "data node volume choosing policy" which is by default set to round robin , you need to consider choosing available space instead. i suggest you to read this article from Cloudera as well. https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ Best Regards

tarekabouzeid91 · ‎08-23-2021

Hi, can you use beeline and type the below command then recreate the table : set parquet.column.index.access=false; this should make hive not use the index of your create table statement to map the data in your files, but instead it will use the columns names . hope this works for you. Best Regards

tarekabouzeid91 · ‎06-30-2021

You can replace the sentry part of your script with Apache ranger API to create/update/delete ranger policies, example here Ranger RestAPIs for Creating, Updating, Deleting, and Searching Policies in Big SQL - Hadoop Dev (ibm.com)

tarekabouzeid91 · ‎06-30-2021

Make sure that you are using the oracle jdbc driver version which is compatible with the oracle db version you are connecting to

tarekabouzeid91 · ‎06-30-2021

You can check Kafka Mirror Maker here Set up MirrorMaker in Cloudera Manager , also if the 2 clusters are secured via kerberos and reside in 2 different realms you need to make sure there's trust between these 2 kerberos realms

tarekabouzeid91 · ‎06-30-2021

I assume you are using Capacity scheduler not fair scheduler. that's why queues wont take available resources from other queues, you can read more regarding that here Comparison of Fair Scheduler with Capacity Scheduler | CDP Public Cloud (cloudera.com) .

tarekabouzeid91 · ‎06-07-2021

Following are the configurations for connecting Apache Ranger with LDAP/LDAPS. There's an important tool that will help identify some settings in your AD AD Explorer - Windows Sysinternals | Microsoft Docs. This configuration will sync LDAP users and link them with their LDAP groups every 12 hours, so later from Apache Ranger, you can give permission based on LDAP groups as well. For connecting using LDAPS, ensure you have the proper certificates added in the same server that contains the Ranger's UserSync service. Configuration Name Configuration Value Comment ranger.usersync.source.impl.class org.apache.ranger.ldapusersync.process.LdapUserGroupBuilder ranger.usersync.sleeptimeinmillisbetweensynccycle 12 hour ranger.usersync.ldap.url ldaps://myldapserver.example.com ldaps or ldap based on your LDAP security ranger.usersync.ldap.binddn myuser@example.com ranger.usersync.ldap.ldapbindpassword mypassword ranger.usersync.ldap.searchBase OU=hadoop,DC=example,DC=com You can browse your AD and check which OU you want to make Ranger sync ranger.usersync.ldap.user.searchbase OU=hadoop2,DC=example,DC=com;OU=hadoop,DC=example,DC=com You can browse your AD and check which OU you want to make Ranger sync, you can also add 2 OU and separate them with ; ranger.usersync.ldap.user.objectclass user double-check the same ranger.usersync.ldap.user.searchfilter (memberOf=CN=HADOOP_ACCESS,DC=example,DC=com) if you want to filter specific users to be synced in Ranger and not your entire AD ranger.usersync.ldap.user.nameattribute sAMAccountName double-check the same ranger.usersync.ldap.user.groupnameattribute memberOf double check the same ranger.usersync.user.searchenabled true ranger.usersync.group.searchbase OU=hadoop,DC=example,DC=com You can browse your AD and check which OU you want to make Ranger sync ranger.usersync.group.objectclass group double-check the same ranger.usersync.group.searchfilter (cn=hadoop_*) if you want to sync specific groups not all AD groups ranger.usersync.group.nameattribute cn double-check the same ranger.usersync.group.memberattributename member double-check the same ranger.usersync.group.search.first.enabled true ranger.usersync.truststore.file /path/to/truststore-file ranger.usersync.truststore.password TRUST_STORE_PASSWORD Here is a helpful link on how to construct complex LDAP search queries. Search Filter Syntax - Win32 apps | Microsoft Docs Disclaimer from Cloudera: This article is contributed by an external user. Steps/ Content may not be technically verified by Cloudera and may not be applicable for all use cases and specifically to a particular distribution. Follow with caution and own risk. If needed, raise a support case to get the confirmation.

Online	Offline
Last Visited	‎10-12-2021 03:27 AM

Member Since	‎02-09-2015 12:35 AM
Last Visited	‎10-12-2021 03:27 AM
Posts	95
Kudos received	8

Cloudera Community

Re: Parquet schema error

Re: sqoop jdbc error sandbox hortonwork

Re: Kafka offsets in DR scenario

Re: Hive - tez , vertex failed error during reduc...

Re: Cannot read data using Spark - Hive Warehouse...

Re: Spark ports to connect to rdbms source using j...

Re: Unable to execute job on Yarn after Cluster Ha...

Re: load data from hdfs in druid

Re: Is it safe to have nodes with different number...

Re: Parquet schema error

Re: CDP Ranger permissions creation script

Re: sqoop jdbc error sandbox hortonwork

Re: Kafka offsets in DR scenario

Re: Issue with queues in HDP 3.1.4

Apache Ranger LDAP / LDAPS configuration