About ggangadharan

ggangadharan · ‎10-27-2023

The error message "Unknown HS2 problem when communicating with Thrift server" typically indicates that there is an issue when trying to communicate with the Hive Server 2 (HS2) through its Thrift interface. This error can occur for various reasons, and troubleshooting it may require some investigation. Here are some common steps to help resolve this issue: Check Hive Server Status: Ensure that the Hive Server 2 is up and running and that it's reachable from your client. You can check its status and logs to see if there are any errors or issues reported. Network Connectivity: Verify that there are no network-related issues that might be preventing your client from connecting to the Hive Server. Check firewalls, network configurations, and any potential network interruptions. Hive Configuration: Review the Hive server's configuration to ensure it's correctly set up. Pay attention to security configurations, like authentication and authorization settings. Thrift Protocol Version: Ensure that the Thrift protocol version used by your client matches the version supported by the Hive Server. Mismatched protocol versions can lead to communication problems. Client-Side Issues: Check the client application or code you are using to interact with Hive. Ensure that it's properly configured and making the correct requests to the Hive Server. Logs and Error Messages: Examine the logs and error messages in more detail to get specific information about what might be causing the problem. This can help pinpoint the issue. Server Version Compatibility: Ensure that the client and server components (Hive client and Hive Server) are compatible in terms of versions. Incompatible versions can lead to communication issues. Authentication and Authorization: If your Hive server is configured with authentication and authorization, ensure that you have the necessary permissions and credentials to access it. Load and Resource Constraints: Check if the Hive Server is under heavy load or if there are resource constraints that might be affecting its ability to respond to client requests. Driver and Libraries: Ensure that you are using the correct driver or libraries for your client application. If you're using JDBC or ODBC, make sure the corresponding driver is installed and configured correctly. If you continue to face issues after performing these checks, it may be necessary to provide more specific error messages or details about your environment to diagnose the problem further.

ggangadharan · ‎10-13-2023

Verify the submission queue for application_1440861466017_0007 and ensure it has sufficient resources to launch the application.

ggangadharan · ‎10-10-2023

In Hive, you can achieve a similar result as the UNPIVOT operation in SQL Server by using the LATERAL VIEW and lateral VIEW OUTER explode functions to split the columns into rows. Here's how you can convert your SQL Server query to Hive: SELECT x, check AS y, split AS z FROM dbo.tbl1 LATERAL VIEW OUTER explode(array(1, y2, y3, y4, y5, y6, y7, y8, y9, y10)) tbl AS split; In this Hive query: LATERAL VIEW OUTER explode is used to split the values from columns y2 to y10 into separate rows. The AS clause assigns aliases to the columns, where split corresponds to the values from the UNPIVOTed columns, check corresponds to the column name (y), and x remains unchanged. This query will produce a result similar to the UNPIVOT operation in SQL Server, where the values from columns y2 to y10 are split into separate rows along with their corresponding x and y values. In Hive, you can achieve a similar result as the PIVOT operation in SQL Server by using conditional aggregation along with CASE statements. Here's how you can convert your SQL Server query to Hive: SELECT * FROM ( SELECT a, b, c, cbn_TYPE FROM tbl2 ) SRC LEFT JOIN ( SELECT a, SUM(CASE WHEN cbn_TYPE = 'ONE TQ FOUR' THEN TOTAL_AMOUNT ELSE 0 END) AS ONE_TQ_FOUR, SUM(CASE WHEN cbn_TYPE = 'going loss' THEN TOTAL_AMOUNT ELSE 0 END) AS going_loss, SUM(CASE WHEN cbn_TYPE = 'COSTS LEAVING team sales' THEN TOTAL_AMOUNT ELSE 0 END) AS COSTS_LEAVING_team_sales, SUM(CASE WHEN cbn_TYPE = 'profit' THEN TOTAL_AMOUNT ELSE 0 END) AS profit, SUM(CASE WHEN cbn_TYPE = 'check money' THEN TOTAL_AMOUNT ELSE 0 END) AS check_money FROM tbl2 GROUP BY a ) PIV ON SRC.a = PIV.a; In this Hive query: We first create an intermediate result set (PIV) that calculates the sums for each cbn_TYPE using conditional aggregation (SUM with CASE statements). The LEFT JOIN is used to combine the original source table (SRC) with the aggregated result (PIV) based on the common column a. The result will have columns a, b, c, and the pivoted columns ONE_TQ_FOUR, going_loss, COSTS_LEAVING_team_sales, profit, and check_money, similar to the PIVOT operation in SQL Server. This query essentially performs a manual pivot operation in Hive by using conditional aggregation to calculate the sums for each cbn_TYPE and then joining the results back to the original table. In Hive, you can use the CASE statement to achieve the same result as the SQL Server expression NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0). Here's the equivalent Hive query: SELECT CASE WHEN (abc.Tc IS NULL AND abc.YR IS NULL) OR (abc.Tc + abc.YR = 0) THEN NULL ELSE abc.Tc + abc.YR END AS result FROM your_table AS abc; In this Hive query: We use the CASE statement to conditionally calculate the result. If both abc.Tc and abc.YR are NULL, or if their sum is equal to 0, we return NULL. Otherwise, we return the sum of abc.Tc and abc.YR. This query replicates the behavior of the NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0) expression in SQL Server, providing a Hive-compatible solution for achieving the same result.

ggangadharan · ‎10-10-2023

Please share the complete stack-trace to get better context. To perform an INSERT OVERWRITE operation on a Hive ACID transactional table, you need to ensure that you have the right configuration and execute the query correctly. Here are the steps and configurations: Enable ACID Transactions: Make sure your table is created with ACID properties. You can specify it during table creation like this: CREATE TABLE my_table ( -- Your table schema here ) STORED AS ORC TBLPROPERTIES ('transactional'='true'); If your table is not already transactional, you may need to create a new transactional table with the desired schema. Set Hive ACID Properties: You should set some Hive configuration properties to enable ACID transactions if they are not already set: SET hive.support.concurrency=true; SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; SET hive.compactor.initiator.on=true; SET hive.compactor.worker.threads=1; -- Number of compactor threads depending on the number of managed tables and usgae Perform the INSERT OVERWRITE: Use the INSERT OVERWRITE statement to replace the data in the table: INSERT OVERWRITE TABLE my_table SELECT ... FROM ... Ensure that the SELECT statement fetches the data you want to overwrite with. You can use a WHERE clause or other filters to specify the data you want to replace. Enable Auto-Compaction : You can enable auto-compaction to periodically clean up small files created by ACID transactions. REF - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/ch02s05s01.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_acid_operations.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_hive_data_compaction.html

ggangadharan · ‎10-10-2023

Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode pyspark --master yarn --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER' --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'. Example Using Python version 2.7.5 (default, Jun 28 2022 15:30:04) SparkSession available as 'spark'. >>> from pyspark_llap import HiveWarehouseSession >>> hive = HiveWarehouseSession.session(spark).build() >>> df=hive.sql("select * from spark_hwc.employee") 23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist 23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners >>> df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save() >>> >>> >>> hive.sql("select count(*) from spark_hwc.employee_new").show() 23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist +---+ |_c0| +---+ | 5| +---+ >>> To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'. Example >>> df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save() >>> To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html

ggangadharan · ‎10-10-2023

Grafana is a popular open-source platform for monitoring and observability, and it is commonly associated with telemetry data visualization, especially when integrated with time-series databases like Prometheus, InfluxDB, or Elasticsearch. However, Grafana is not limited to telemetry data visualization, and it can be used for a wide range of data sources, including HDFS and Hive tables. Here are some options for using Grafana for data visualization beyond telemetry: Hive Data Sources: Grafana has built-in support for various data sources, and it offers plugins for connecting to databases and data lakes. You can configure Grafana to connect to Hive as a data source and visualize data stored in Hive tables. HDFS Data Sources: While Grafana primarily focuses on time-series data, you can still use it to visualize data stored in HDFS by connecting it to Hadoop-related data sources or by exporting HDFS data to another data store (e.g., Elasticsearch, InfluxDB) that Grafana supports. SQL Databases: Grafana can connect to traditional relational databases using SQL data sources. If you have data stored in SQL databases, you can use Grafana to create dashboards and visualizations. Log Data: Grafana can be used for log data analysis and visualization. You can integrate it with tools like Loki (for log aggregation) and explore log data in dashboards. Custom Plugins: If you have a unique data source or a specific format, you can develop custom data source plugins for Grafana to connect to your data and visualize it as needed. API Data: Grafana supports various data sources that expose data through APIs. You can connect to REST APIs, GraphQL APIs, and other web services to visualize data. Mixed Data Sources: Grafana allows you to create dashboards that combine data from multiple sources, making it versatile for various data visualization needs. While Grafana is flexible and can be used for a wide range of data sources, it's important to consider the nature of your data and the specific visualization requirements. Depending on your use case, you may need to choose the most suitable data source, data format, and visualization options within Grafana to achieve your desired results.

ggangadharan · ‎10-09-2023

Could you kindly provide the DDL and a sample dataset to facilitate a more in-depth explanation?

ggangadharan · ‎10-04-2023

In Hive, there is no specific built-in data type that directly corresponds to the SQL Image data type for retaining the original binary image data from an SQL source. Hive primarily deals with structured data types like strings, numbers, and complex types such as arrays, maps, and structs. To store binary data like images in Hive, you typically use the BINARY data type or store them as STRING data, especially if you want to represent them in base64-encoded format. However, neither of these data types inherently retains the original binary value as is. You would need to handle the encoding and decoding of the binary data yourself. Here's an example of how you might store binary image data in Hive using the BINARY data type: CREATE TABLE image_data ( image_id INT, image_content BINARY ); When you insert data into this table, you would need to encode the binary image data into a binary format suitable for storage in Hive. If retaining the original binary image data in its original format is crucial, you may want to consider other data storage solutions that are specifically designed for binary data, such as a distributed file system or binary data storage services. Hive, being primarily designed for structured data, may not be the best choice for this use case if you need to maintain the exact original binary data without encoding or modification.

ggangadharan · ‎10-03-2023

It seems that the query involves dynamic partitioning, but the dynamic partition column is not included in either the select statement or the Common Table Expression (CTE). Please add the dynamic partition column 'date' to the select statement and validate it in Beeline.

ggangadharan · ‎10-03-2023

It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html

Online	Offline
Last Visited	‎05-10-2026 10:55 PM

Member Since	‎09-16-2021 02:45 AM
Last Visited	‎05-10-2026 10:55 PM
Posts	423
Kudos received	55

Cloudera Community

Re: HWC on CDP 7.3.1 with Spark 3.5

Re: Using Hadoop Iceberg catalog with Hive engine ...

Re: Where can I find the Maven repository for HDP ...

Re: Hive on TEZ memory footprint and Impala stats...

Re: Hive little query slow

Re: Unable to connect to HIVE from CLI though the ...

Re: Spark application stuck at ACCEPTED state at s...

Re: SQL to Hive Query Conversion

Re: [How?] Hive INSERT OVERWRITE to Transaction Ta...

Re: Unable to append overwrite Hive ACID table usi...

Re: data monitoring dashboard from hdfs and hive

Re: Why doesn't hive bucketing work at partition l...

Re: SQOOP import of "image" data type into hive

Re: Ozzie Workflow for CTE and Insert Statement

Re: Insert data from spark data frame to hive orc ...