About mszurap

mszurap · ‎08-27-2019

You can use any JDBC client to test whether the JDBC connection URL is proper, for example: http://jdbcsql.sourceforge.net/ or http://squirrel-sql.sourceforge.net/ Have you tried to use the fully qualified domain name of the host instead of localhost? (make sure to allow the connection for the sentry user for the FQDN too on MySQL side)

mszurap · ‎08-26-2019

When you can't submit Hive on Spark queries, you need to review what is in the HiveServer2 logs. From client end (beeline) it is unfortunately not obvious. In any case you need to make sure that: - Spark service has been enabled as a dependency in Hive service > Configuration - Review Spark related settings in Hive service > Configuration - you have enough resources on the cluster and can submit YARN jobs Do you have error messages from the HS2 logs? Thanks Miklos

mszurap · ‎08-26-2019

Hi, The message "java.net.ConnectException: Connection refused" is quite clear, the backend database has refused the connection request. Check that your DB is is up and running and that you can connect to it from the localhost. Haven't you moved the Sentry role or the database to another host?Or do you have SSL enabled/required for the backend database?

mszurap · ‎08-23-2019

Hello Harsha, Based on the stacktrace: 1. HMS fails to start up when it tries to initialize it's token store ("Hive Metastore Delegation Token Store" -> org.apache.hadoop.hive.thrift.ZooKeeperTokenStore). If you do not need this, you may change it to DBTokenStore, but I would go with the next item rather. 2. The "java.lang.NoSuchMethodError" suggests that there is a jar conflict, and multiple versions of a jar exists on the classpath, and org/apache/curator/framework/imps/CreateBuilderImpl is not compiled with the actually loaded org/apache/curator/utils/ThreadUtils class. Please check in which jars do you have these classes for i in $(find . -name '*.jar') ; do /usr/java/latest/bin/jar tf $i | grep org/apache/curator/framework/imps/CreateBuilderImpl.class && echo $i ; done for i in $(find . -name '*.jar') ; do /usr/java/latest/bin/jar tf $i | grep org/apache/curator/utils/ThreadUtils.class && echo $i ; done and try to find which of these are (mistakenly?) included in the HMS's classpath, use jinfo to get the classpath of HMS: <java_home_of_hms>/bin/jinfo <pid_of_hms> | grep java.class.path What Hive/CDH/HDP version are you using?

mszurap · ‎01-04-2018

Hello Abhi, At the time of this writing (latest version are CDH 5.13.1 / Spark 2.2.x) Hive on Spark2 is not supported. See our documentation: https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_hive_ki.html#hive_on_spark2 "Hive-on-Spark is a CDH component that has a dependency on Spark 1.6. Because CDH components do not have any dependencies on Spark 2, Hive-on-Spark does not work with the Cloudera Distribution of Apache Spark 2." I hope this answers your question. Regards Miklos Szurap Customer Operations Engineer

mszurap · ‎04-13-2017

Since CDH 5.8 you can use set PARQUET_FALLBACK_SCHEMA_RESOLUTION=1; to look up columns within Parquet files by column name. For more, see: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet_fallback_schema_resolution.html

mszurap · ‎01-23-2017

Hi, If you see the same for Hive/MapReduce jobs, then please can you confirm that the table in question has many partitions? We have seen similar problems for tables where it contained over thousand or ten thousands of partitions.

mszurap · ‎01-23-2017

Hello Rakesh, Unfortunately s3n filesystem scheme is not supported by Impala: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_s3.html#s3_restrictions If you set up s3a, please look at the recommended settings at "Best Practices for Using Impala with S3" section on the same page. Regards Miklos Szurap Customer Operations Engineer

mszurap · ‎11-21-2016

Hi Wenbin, I hope I understood well your use case. So you say that the data files are transferred to the correct HDFS location (with proper partitioning format directories, like partitionname=partitionvalue) but you want to make aware the Hive that there is a new partition on the HDFS. In this case you need the MSCK REPAIR TABLE table_name command, please see: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE) In this case you don't need to execute ALTER TABLE ADD PARTITION for each new partition, Hive will recognize it. In the newer Impala versions the same functionality exists in Impala as command: ALTER TABLE table_name RECOVER PARTITIONS Regards Miklos Szurap Customer Operations Engineer

mszurap · ‎11-21-2016

Hi, Thrift protocol is used for communication inside impala, as well as from impala (and catalog daemon) to hive metastore. If you see socket timeouts withing this context, most probably one service is not in good health, you should check whether the impala service instances are up-and-running: hive metastore, impala daemon, catalog daemon and impala statestore. Regards Miklos Szurap Customer Operations Engineer

Online	Offline
Last Visited	‎12-10-2024 10:10 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎12-10-2024 10:10 AM
Posts	260
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Upgrade Sentry Database Tables fails after upg...

Re: cant use hive on spark engine cannot create cl...

Re: Upgrade Sentry Database Tables fails after upg...

Re: HiveMetastore not listening on port 9083

Re: Hive on spark2

Re: external table stored as parquet - can not use...

Re: Hive jobs are not running on s3n/impala failed...

Re: Hive jobs are not running on s3n/impala failed...

Re: Restore partitions in another Hive or Impala a...

Re: exercise 2 error timed out (code THRIFTSOCKET...