About ggangadharan

ggangadharan · ‎09-29-2023

To gain a better understanding of the issue, kindly provide the HS2 jstacks at 30-second intervals until the query completes

ggangadharan · ‎09-29-2023

The stack traces for Error 1 and Error 3 are incomplete. To gain a better understanding of the issue, please provide the complete stack traces. Sharing the complete appLogs will provide a comprehensive view of the situation Regarding error 2, it appears that the job is attempting to create over 2000 dynamic partitions on a single node, which is an unusual behavior. Please review the partition column values for correctness. If everything appears to be in order, you can consider adjusting the following configurations: hive.exec.max.dynamic.partitions hive.exec.max.dynamic.partitions.pernode

ggangadharan · ‎09-29-2023

It appears that the Hive Metastore (HMS) is unable to establish a connection with the BackendDB, possibly due to an incorrect hostname or BackendDB configuration within the Hive service. Please validate the BackendDB configurations and attempt to start the service again. Exception in thread "main" java.lang.RuntimeException: org.postgresql.util.PSQLException: The connection attempt failed. at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.countTables(HiveMetastoreDbUtil.java:203) at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.printTableCount(HiveMetastoreDbUtil.java:284) at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.main(HiveMetastoreDbUtil.java:354) Caused by: org.postgresql.util.PSQLException: The connection attempt failed. at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:297) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:217) at org.postgresql.Driver.makeConnection(Driver.java:458) at org.postgresql.Driver.connect(Driver.java:260) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:247) at com.cloudera.enterprise.dbutil.SqlRunner.open(SqlRunner.java:193) at com.cloudera.enterprise.dbutil.SqlRunner.getDatabaseName(SqlRunner.java:264) at com.cloudera.cmf.service.hive.HiveMetastoreDbUtil.countTables(HiveMetastoreDbUtil.java:197) ... 2 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at org.postgresql.core.PGStream.<init>(PGStream.java:81) at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:93) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:197)

ggangadharan · ‎09-27-2023

if the partition data exists like below: <s3:bucket>/<some_location>/<part_column>=<part_value>/<filename> you can create a external table by specifiying above location and run 'msck repair table <table_name> sync partitions' to sync partitions. validate the data by running some sample select statements. Once it's done you can create new external table with another bucket and run insert statement with dynamic partition. Ref - https://cwiki.apache.org/confluence/display/hive/dynamicpartitions

vaishaakb · ‎09-05-2023

Hey @Shivakuk Circling back to see if my response was helpful. I am happy to help you if you have followup questions. Thanks!

ggangadharan · ‎07-20-2023

We verified the same in the CDP environment, as we are uncertain about the Databricks Spark environment. As we have mixed of managed and external tables , extracted the necessary information through HWC. >>> database=spark.sql("show tables in default").collect() 23/07/20 10:04:45 INFO rule.HWCSwitchRule: Registering Listeners 23/07/20 10:04:47 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist Hive Session ID = e6f70006-0c2e-4237-9a9e-e1d19901af54 >>> desiredColumn="name" >>> tablenames = [] >>> for row in database: ... cols = spark.table(row.tableName).columns ... listColumns= spark.table(row.tableName).columns ... if desiredColumn in listColumns: ... tablenames.append(row.tableName) ... >>> >>> print("\n".join(tablenames)) movies tv_series_abc cdp1 tv_series spark_array_string_example >>>

keroz · ‎07-19-2023

Hi ggangadharan, Thanks for you feedback. Maybe I need to explain more detail on my case above. Both solution you give more to sqoop/import the data using the JDBC. My current situation, the source system not give the permission to access their db. They will export the file and save as .sql file. I need to download the file and load into hive table.

VidyaSargur · ‎07-18-2023

@Choolake, Thank you for your participation in Cloudera Community. I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

ggangadharan · ‎07-15-2023

@Sunanna Validate the job status using below command. hadoop job -status <hadoop_job_id> yarn application -status <hadoop_application_id> Depends upon the status validate the logs using below , If needed validate the Jstack of the child tasks for better understanding. yarn logs -applicationId <applicationId>

ggangadharan · ‎07-14-2023

If my understanding is correct, the schema is altered for different input files, which implies that the data itself lacks a structured schema. Given the frequent changes in the schema, it is advisable to store the data in a column-oriented system such as HBASE. The Same HBASE data can be accessed through spark using HBase-Spark Connector. Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/accessing-hbase/topics/hbase-example-using-hbase-spark-connector.html

Online	Offline
Last Visited	‎12-23-2025 01:30 AM

Member Since	‎09-16-2021 02:45 AM
Last Visited	‎12-23-2025 01:30 AM
Posts	422
Kudos received	55

Cloudera Community

Re: HWC on CDP 7.3.1 with Spark 3.5

Re: Using Hadoop Iceberg catalog with Hive engine ...

Re: Where can I find the Maven repository for HDP ...

Re: Failed with exception java.io.IOException:org....

Re: Hive on TEZ memory footprint and Impala stats...

Re: Is it possible to disable column level securit...

Re: CDP - Facing issues with Hive

Re: Create Hive Metastore database tables has fail...

Re: How to load existing partitoned parquet data i...

Re: Is Cloudera have estimation time for jobs comp...

Re: Databricks Error Inquiry: org.apache.spark.Spa...

Re: Ingesting dump db (.sql) into hdfs

Re: Unable to get the record count into the variab...

Re: mapreduce wordcount program got stuck

Re: Hive with spark table schema changes sensitivi...