Member since
02-09-2015
95
Posts
8
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5748 | 08-23-2021 04:07 PM | |
1504 | 06-30-2021 07:34 AM | |
1821 | 06-30-2021 07:26 AM | |
14414 | 05-17-2019 10:27 PM | |
3158 | 04-08-2019 01:00 PM |
05-16-2024
05:48 AM
1 Kudo
Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found. We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows: COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \ --files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \ --jars {{ hwc_jar_path }} \ --conf spark.security.credentials.hiveserver2.enabled=false \ --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \ --conf spark.dynamicAllocation.enable=false \ --conf spark.hadoop.metastore.catalog.default=hive \ --conf spark.yarn.maxAppAttempts=1 \ --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.timeParserPolicy=LEGACY \ --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \ --conf spark.sql.parquet.int96TimestampConversion=true \ --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \ --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \ --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \ --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \ --class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*"; exec $COMMAND Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting. In the above example are some Ansible variables: hwc_jar_path: "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar" ad_realm is our LDAP realm. Hope it helps anyone.
... View more
04-21-2023
02:58 PM
I'm also getting same error when my spark application is trying to connect hbase. Found no valid authentication method from options. @tarekabouzeid91 @Ninads are you able to find the solution to fix this issue?
... View more
12-22-2022
11:59 PM
where can i get that jar ?
... View more
08-31-2021
01:58 PM
Hi Apache spark will initiate connection to your db on that port only via jdbc , so you can open a firewall where sources are your nodes ips and destination is your db server ip on the port you specified. Best Regards
... View more
08-31-2021
01:52 PM
Hi, do you have apache ranger installed ? if yes, check that the right policies are added under yarn service and the ranger user sync service is configured and syncing AD users and groups. Best Regards
... View more
08-31-2021
01:31 PM
Hi, can you post the error please? also cluld you please clarify the below : is your cluster having kerberos enabled? also did you enable hdfs extension for druid? whats the data type you are trying to read from hdfs ? Best Regards
... View more
08-31-2021
01:15 PM
1 Kudo
Hi, With Hadoop 3, there is intra node balance as well as the data nodes balance which can help you distribute and balance the data on your nodes cluster. for sure the recommended way is having all data nodes with same number of disks and size, but its is possible to have different config for data nodes but you will need to keep balancing your data nodes quite often which will take computation and network resources. Also another thing to consider when you have disks with different size is "data node volume choosing policy" which is by default set to round robin , you need to consider choosing available space instead. i suggest you to read this article from Cloudera as well. https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ Best Regards
... View more
08-23-2021
04:07 PM
Hi, can you use beeline and type the below command then recreate the table : set parquet.column.index.access=false; this should make hive not use the index of your create table statement to map the data in your files, but instead it will use the columns names . hope this works for you. Best Regards
... View more
08-06-2021
01:25 AM
Had the same issue on CDP 7.1.6, which comes with Tez 0.9.1. Looks like this: https://issues.apache.org/jira/browse/TEZ-4057 One workaround (probably not 100% secure) is to add the yarn user to the hive group: usermod -a -G hive yarn This needs to be done on all nodes and requires Yarn services restart. After that the issue has gone, no more random errors for Hive on Tez anymore.
... View more
07-29-2021
11:08 PM
@smkmuthu, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information as requested?
... View more