Member since
02-09-2015
95
Posts
8
Kudos Received
9
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 8828 | 08-23-2021 04:07 PM | |
| 2363 | 06-30-2021 07:34 AM | |
| 3155 | 06-30-2021 07:26 AM | |
| 19430 | 05-17-2019 10:27 PM | |
| 4168 | 04-08-2019 01:00 PM |
05-16-2024
05:48 AM
1 Kudo
Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found. We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows: COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \ --files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \ --jars {{ hwc_jar_path }} \ --conf spark.security.credentials.hiveserver2.enabled=false \ --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \ --conf spark.dynamicAllocation.enable=false \ --conf spark.hadoop.metastore.catalog.default=hive \ --conf spark.yarn.maxAppAttempts=1 \ --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.timeParserPolicy=LEGACY \ --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \ --conf spark.sql.parquet.int96TimestampConversion=true \ --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \ --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \ --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \ --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \ --class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*"; exec $COMMAND Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting. In the above example are some Ansible variables: hwc_jar_path: "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar" ad_realm is our LDAP realm. Hope it helps anyone.
... View more
12-22-2022
11:59 PM
where can i get that jar ?
... View more
08-23-2021
04:07 PM
Hi, can you use beeline and type the below command then recreate the table : set parquet.column.index.access=false; this should make hive not use the index of your create table statement to map the data in your files, but instead it will use the columns names . hope this works for you. Best Regards
... View more
08-06-2021
01:25 AM
Had the same issue on CDP 7.1.6, which comes with Tez 0.9.1. Looks like this: https://issues.apache.org/jira/browse/TEZ-4057 One workaround (probably not 100% secure) is to add the yarn user to the hive group: usermod -a -G hive yarn This needs to be done on all nodes and requires Yarn services restart. After that the issue has gone, no more random errors for Hive on Tez anymore.
... View more
07-29-2021
11:08 PM
@smkmuthu, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information as requested?
... View more
07-12-2021
01:59 AM
@tarekabouzeid91 wrote: I assume you are using Capacity scheduler not fair scheduler. that's why queues wont take available resources from other queues, you can read more regarding that here Comparison of Fair Scheduler with Capacity Scheduler | CDP Public Cloud (cloudera.com) . Yes I am using Capacity scheduler. yarn.resourcemanager.scheduler.class = org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
... View more
06-30-2021
07:34 AM
Make sure that you are using the oracle jdbc driver version which is compatible with the oracle db version you are connecting to
... View more
06-07-2021
11:09 PM
Following are the configurations for connecting Apache Ranger with LDAP/LDAPS. There's an important tool that will help identify some settings in your AD AD Explorer - Windows Sysinternals | Microsoft Docs.
This configuration will sync LDAP users and link them with their LDAP groups every 12 hours, so later from Apache Ranger, you can give permission based on LDAP groups as well.
For connecting using LDAPS, ensure you have the proper certificates added in the same server that contains the Ranger's UserSync service.
Configuration Name
Configuration Value
Comment
ranger.usersync.source.impl.class
org.apache.ranger.ldapusersync.process.LdapUserGroupBuilder
ranger.usersync.sleeptimeinmillisbetweensynccycle
12 hour
ranger.usersync.ldap.url
ldaps://myldapserver.example.com
ldaps or ldap based on your LDAP security
ranger.usersync.ldap.binddn
myuser@example.com
ranger.usersync.ldap.ldapbindpassword
mypassword
ranger.usersync.ldap.searchBase
OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync
ranger.usersync.ldap.user.searchbase
OU=hadoop2,DC=example,DC=com;OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync, you can also add 2 OU and separate them with ;
ranger.usersync.ldap.user.objectclass
user
double-check the same
ranger.usersync.ldap.user.searchfilter
(memberOf=CN=HADOOP_ACCESS,DC=example,DC=com)
if you want to filter specific users to be synced in Ranger and not your entire AD
ranger.usersync.ldap.user.nameattribute
sAMAccountName
double-check the same
ranger.usersync.ldap.user.groupnameattribute
memberOf
double check the same
ranger.usersync.user.searchenabled
true
ranger.usersync.group.searchbase
OU=hadoop,DC=example,DC=com
You can browse your AD and check which OU you want to make Ranger sync
ranger.usersync.group.objectclass
group
double-check the same
ranger.usersync.group.searchfilter
(cn=hadoop_*)
if you want to sync specific groups not all AD groups
ranger.usersync.group.nameattribute
cn
double-check the same
ranger.usersync.group.memberattributename
member
double-check the same
ranger.usersync.group.search.first.enabled
true
ranger.usersync.truststore.file
/path/to/truststore-file
ranger.usersync.truststore.password
TRUST_STORE_PASSWORD
Here is a helpful link on how to construct complex LDAP search queries. Search Filter Syntax - Win32 apps | Microsoft Docs
Disclaimer from Cloudera: This article is contributed by an external user. Steps/ Content may not be technically verified by Cloudera and may not be applicable for all use cases and specifically to a particular distribution. Follow with caution and own risk. If needed, raise a support case to get the confirmation.
... View more
Labels:
05-26-2021
04:57 AM
Hi, Below are configuration for connecting Apache Ranger with LDAP/LDAPS. There's an important tool that will help to identify some settings in your AD AD Explorer - Windows Sysinternals | Microsoft Docs This configuration will sync LDAP users and link them with their LDAP groups every 12 hour, so you later from Apache Ranger you can give permission based on LDAP groups as well. For connecting using LDAPS, make sure you have the proper certificates added in the same server that contains the Ranger's UserSync service. Configuration Name Configuration Value Comment ranger.usersync.source.impl.class org.apache.ranger.ldapusersync.process.LdapUserGroupBuilder ranger.usersync.sleeptimeinmillisbetweensynccycle 12 hour ranger.usersync.ldap.url ldaps://myldapserver.example.com ldaps or ldap based on your LDAP security ranger.usersync.ldap.binddn myuser@example.com ranger.usersync.ldap.ldapbindpassword mypassword ranger.usersync.ldap.searchBase OU=hadoop,DC=example,DC=com you can browse your AD and check which OU you want to make Ranger sync ranger.usersync.ldap.user.searchbase OU=hadoop2,DC=example,DC=com;OU=hadoop,DC=example,DC=com you can browse your AD and check which OU you want to make Ranger sync, you can also add 2 OU and separate them with ; ranger.usersync.ldap.user.objectclass user double check the same ranger.usersync.ldap.user.searchfilter (memberOf=CN=HADOOP_ACCESS,DC=example,DC=com) if you want to filter specific users to be synced in ranger and not your entire AD ranger.usersync.ldap.user.nameattribute sAMAccountName double check the same ranger.usersync.ldap.user.groupnameattribute memberOf double check the same ranger.usersync.user.searchenabled true ranger.usersync.group.searchbase OU=hadoop,DC=example,DC=com you can browse your AD and check which OU you want to make Ranger sync ranger.usersync.group.objectclass group double check the same ranger.usersync.group.searchfilter (cn=hadoop_*) if you want to sync specific groups not all AD groups ranger.usersync.group.nameattribute cn double check the same ranger.usersync.group.memberattributename member double check the same ranger.usersync.group.search.first.enabled true ranger.usersync.truststore.file /path/to/truststore-file ranger.usersync.truststore.password TRUST_STORE_PASSWORD There's some helpful links about how to construct complex LDAP search queries Search Filter Syntax - Win32 apps | Microsoft Docs Best Regards,
... View more
11-09-2019
04:57 PM
Same issue we are facing in our Pyspark streaming .. Could you please let me know is it possible to handle in python mode spark as well ? https://stackoverflow.com/questions/58755063/failed-to-find-leader-for-topics-java-lang-nullpointerexception-nullpointerexce?noredirect=1#comment103834465_58755063
... View more