About tarekabouzeid91

marcel-jan · ‎05-16-2024

Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found. We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows: COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \ --files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \ --jars {{ hwc_jar_path }} \ --conf spark.security.credentials.hiveserver2.enabled=false \ --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \ --conf spark.dynamicAllocation.enable=false \ --conf spark.hadoop.metastore.catalog.default=hive \ --conf spark.yarn.maxAppAttempts=1 \ --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.legacy.timeParserPolicy=LEGACY \ --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \ --conf spark.sql.parquet.int96TimestampConversion=true \ --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \ --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \ --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \ --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \ --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \ --class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*"; exec $COMMAND Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting. In the above example are some Ansible variables: hwc_jar_path: "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar" ad_realm is our LDAP realm. Hope it helps anyone.

skasireddy · ‎04-21-2023

I'm also getting same error when my spark application is trying to connect hbase. Found no valid authentication method from options. @tarekabouzeid91 @Ninads are you able to find the solution to fix this issue?

myzard · ‎12-22-2022

where can i get that jar ?

tarekabouzeid91 · ‎08-31-2021

Hi Apache spark will initiate connection to your db on that port only via jdbc , so you can open a firewall where sources are your nodes ips and destination is your db server ip on the port you specified. Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, do you have apache ranger installed ? if yes, check that the right policies are added under yarn service and the ranger user sync service is configured and syncing AD users and groups. Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, can you post the error please? also cluld you please clarify the below : is your cluster having kerberos enabled? also did you enable hdfs extension for druid? whats the data type you are trying to read from hdfs ? Best Regards

tarekabouzeid91 · ‎08-31-2021

Hi, With Hadoop 3, there is intra node balance as well as the data nodes balance which can help you distribute and balance the data on your nodes cluster. for sure the recommended way is having all data nodes with same number of disks and size, but its is possible to have different config for data nodes but you will need to keep balancing your data nodes quite often which will take computation and network resources. Also another thing to consider when you have disks with different size is "data node volume choosing policy" which is by default set to round robin , you need to consider choosing available space instead. i suggest you to read this article from Cloudera as well. https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ Best Regards

tarekabouzeid91 · ‎08-23-2021

Hi, can you use beeline and type the below command then recreate the table : set parquet.column.index.access=false; this should make hive not use the index of your create table statement to map the data in your files, but instead it will use the columns names . hope this works for you. Best Regards

mzinal · ‎08-06-2021

Had the same issue on CDP 7.1.6, which comes with Tez 0.9.1. Looks like this: https://issues.apache.org/jira/browse/TEZ-4057 One workaround (probably not 100% secure) is to add the yarn user to the hive group: usermod -a -G hive yarn This needs to be done on all nodes and requires Yarn services restart. After that the issue has gone, no more random errors for Hive on Tez anymore.

VidyaSargur · ‎07-29-2021

@smkmuthu, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. If you are still experiencing the issue, can you provide the information as requested?

Online	Offline
Last Visited	‎10-12-2021 03:27 AM

Member Since	‎02-09-2015 12:35 AM
Last Visited	‎10-12-2021 03:27 AM
Posts	95
Kudos received	8

Cloudera Community

Re: Parquet schema error

Re: sqoop jdbc error sandbox hortonwork

Re: Kafka offsets in DR scenario

Re: Hive - tez , vertex failed error during reduc...

Re: Cannot read data using Spark - Hive Warehouse...

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.ta...

Re: Connection error on Kerberose enabled environm...

Re: Cannot read data using Spark - Hive Warehouse...

Re: Spark ports to connect to rdbms source using j...

Re: Unable to execute job on Yarn after Cluster Ha...

Re: load data from hdfs in druid

Re: Is it safe to have nodes with different number...

Re: Parquet schema error

Re: Hive - tez , vertex failed error during reduc...

Re: Kafka offsets in DR scenario