Member since
09-16-2021
423
Posts
55
Kudos Received
39
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1298 | 10-22-2025 05:48 AM | |
| 1378 | 09-05-2025 07:19 AM | |
| 2337 | 07-15-2025 02:22 AM | |
| 3198 | 05-22-2025 03:00 AM | |
| 2004 | 05-19-2025 03:02 AM |
10-27-2023
12:07 AM
The error message "Unknown HS2 problem when communicating with Thrift server" typically indicates that there is an issue when trying to communicate with the Hive Server 2 (HS2) through its Thrift interface. This error can occur for various reasons, and troubleshooting it may require some investigation. Here are some common steps to help resolve this issue: Check Hive Server Status: Ensure that the Hive Server 2 is up and running and that it's reachable from your client. You can check its status and logs to see if there are any errors or issues reported. Network Connectivity: Verify that there are no network-related issues that might be preventing your client from connecting to the Hive Server. Check firewalls, network configurations, and any potential network interruptions. Hive Configuration: Review the Hive server's configuration to ensure it's correctly set up. Pay attention to security configurations, like authentication and authorization settings. Thrift Protocol Version: Ensure that the Thrift protocol version used by your client matches the version supported by the Hive Server. Mismatched protocol versions can lead to communication problems. Client-Side Issues: Check the client application or code you are using to interact with Hive. Ensure that it's properly configured and making the correct requests to the Hive Server. Logs and Error Messages: Examine the logs and error messages in more detail to get specific information about what might be causing the problem. This can help pinpoint the issue. Server Version Compatibility: Ensure that the client and server components (Hive client and Hive Server) are compatible in terms of versions. Incompatible versions can lead to communication issues. Authentication and Authorization: If your Hive server is configured with authentication and authorization, ensure that you have the necessary permissions and credentials to access it. Load and Resource Constraints: Check if the Hive Server is under heavy load or if there are resource constraints that might be affecting its ability to respond to client requests. Driver and Libraries: Ensure that you are using the correct driver or libraries for your client application. If you're using JDBC or ODBC, make sure the corresponding driver is installed and configured correctly. If you continue to face issues after performing these checks, it may be necessary to provide more specific error messages or details about your environment to diagnose the problem further.
... View more
10-13-2023
04:35 AM
Verify the submission queue for application_1440861466017_0007 and ensure it has sufficient resources to launch the application.
... View more
10-10-2023
10:57 AM
1 Kudo
In Hive, you can achieve a similar result as the UNPIVOT operation in SQL Server by using the LATERAL VIEW and lateral VIEW OUTER explode functions to split the columns into rows. Here's how you can convert your SQL Server query to Hive: SELECT x, check AS y, split AS z
FROM dbo.tbl1
LATERAL VIEW OUTER explode(array(1, y2, y3, y4, y5, y6, y7, y8, y9, y10)) tbl AS split; In this Hive query: LATERAL VIEW OUTER explode is used to split the values from columns y2 to y10 into separate rows. The AS clause assigns aliases to the columns, where split corresponds to the values from the UNPIVOTed columns, check corresponds to the column name (y), and x remains unchanged. This query will produce a result similar to the UNPIVOT operation in SQL Server, where the values from columns y2 to y10 are split into separate rows along with their corresponding x and y values. In Hive, you can achieve a similar result as the PIVOT operation in SQL Server by using conditional aggregation along with CASE statements. Here's how you can convert your SQL Server query to Hive: SELECT *
FROM (
SELECT a, b, c, cbn_TYPE
FROM tbl2
) SRC
LEFT JOIN (
SELECT
a,
SUM(CASE WHEN cbn_TYPE = 'ONE TQ FOUR' THEN TOTAL_AMOUNT ELSE 0 END) AS ONE_TQ_FOUR,
SUM(CASE WHEN cbn_TYPE = 'going loss' THEN TOTAL_AMOUNT ELSE 0 END) AS going_loss,
SUM(CASE WHEN cbn_TYPE = 'COSTS LEAVING team sales' THEN TOTAL_AMOUNT ELSE 0 END) AS COSTS_LEAVING_team_sales,
SUM(CASE WHEN cbn_TYPE = 'profit' THEN TOTAL_AMOUNT ELSE 0 END) AS profit,
SUM(CASE WHEN cbn_TYPE = 'check money' THEN TOTAL_AMOUNT ELSE 0 END) AS check_money
FROM tbl2
GROUP BY a
) PIV
ON SRC.a = PIV.a; In this Hive query: We first create an intermediate result set (PIV) that calculates the sums for each cbn_TYPE using conditional aggregation (SUM with CASE statements). The LEFT JOIN is used to combine the original source table (SRC) with the aggregated result (PIV) based on the common column a. The result will have columns a, b, c, and the pivoted columns ONE_TQ_FOUR, going_loss, COSTS_LEAVING_team_sales, profit, and check_money, similar to the PIVOT operation in SQL Server. This query essentially performs a manual pivot operation in Hive by using conditional aggregation to calculate the sums for each cbn_TYPE and then joining the results back to the original table. In Hive, you can use the CASE statement to achieve the same result as the SQL Server expression NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0). Here's the equivalent Hive query: SELECT
CASE
WHEN (abc.Tc IS NULL AND abc.YR IS NULL) OR (abc.Tc + abc.YR = 0) THEN NULL
ELSE abc.Tc + abc.YR
END AS result
FROM your_table AS abc; In this Hive query: We use the CASE statement to conditionally calculate the result. If both abc.Tc and abc.YR are NULL, or if their sum is equal to 0, we return NULL. Otherwise, we return the sum of abc.Tc and abc.YR. This query replicates the behavior of the NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0) expression in SQL Server, providing a Hive-compatible solution for achieving the same result.
... View more
10-10-2023
10:50 AM
Please share the complete stack-trace to get better context. To perform an INSERT OVERWRITE operation on a Hive ACID transactional table, you need to ensure that you have the right configuration and execute the query correctly. Here are the steps and configurations: Enable ACID Transactions: Make sure your table is created with ACID properties. You can specify it during table creation like this: CREATE TABLE my_table (
-- Your table schema here
)
STORED AS ORC
TBLPROPERTIES ('transactional'='true'); If your table is not already transactional, you may need to create a new transactional table with the desired schema. Set Hive ACID Properties: You should set some Hive configuration properties to enable ACID transactions if they are not already set: SET hive.support.concurrency=true;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1; -- Number of compactor threads depending on the number of managed tables and usgae Perform the INSERT OVERWRITE: Use the INSERT OVERWRITE statement to replace the data in the table: INSERT OVERWRITE TABLE my_table
SELECT ...
FROM ... Ensure that the SELECT statement fetches the data you want to overwrite with. You can use a WHERE clause or other filters to specify the data you want to replace. Enable Auto-Compaction : You can enable auto-compaction to periodically clean up small files created by ACID transactions. REF - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/ch02s05s01.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_acid_operations.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_hive_data_compaction.html
... View more
10-10-2023
10:29 AM
1 Kudo
Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode pyspark --master yarn --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER' --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'. Example Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
>>> from pyspark_llap import HiveWarehouseSession
>>> hive = HiveWarehouseSession.session(spark).build()
>>> df=hive.sql("select * from spark_hwc.employee")
23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners
>>> df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
>>>
>>>
>>> hive.sql("select count(*) from spark_hwc.employee_new").show()
23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
+---+
|_c0|
+---+
| 5|
+---+
>>> To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'. Example >>> df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save()
>>> To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html
... View more
10-10-2023
10:06 AM
Grafana is a popular open-source platform for monitoring and observability, and it is commonly associated with telemetry data visualization, especially when integrated with time-series databases like Prometheus, InfluxDB, or Elasticsearch. However, Grafana is not limited to telemetry data visualization, and it can be used for a wide range of data sources, including HDFS and Hive tables. Here are some options for using Grafana for data visualization beyond telemetry: Hive Data Sources: Grafana has built-in support for various data sources, and it offers plugins for connecting to databases and data lakes. You can configure Grafana to connect to Hive as a data source and visualize data stored in Hive tables. HDFS Data Sources: While Grafana primarily focuses on time-series data, you can still use it to visualize data stored in HDFS by connecting it to Hadoop-related data sources or by exporting HDFS data to another data store (e.g., Elasticsearch, InfluxDB) that Grafana supports. SQL Databases: Grafana can connect to traditional relational databases using SQL data sources. If you have data stored in SQL databases, you can use Grafana to create dashboards and visualizations. Log Data: Grafana can be used for log data analysis and visualization. You can integrate it with tools like Loki (for log aggregation) and explore log data in dashboards. Custom Plugins: If you have a unique data source or a specific format, you can develop custom data source plugins for Grafana to connect to your data and visualize it as needed. API Data: Grafana supports various data sources that expose data through APIs. You can connect to REST APIs, GraphQL APIs, and other web services to visualize data. Mixed Data Sources: Grafana allows you to create dashboards that combine data from multiple sources, making it versatile for various data visualization needs. While Grafana is flexible and can be used for a wide range of data sources, it's important to consider the nature of your data and the specific visualization requirements. Depending on your use case, you may need to choose the most suitable data source, data format, and visualization options within Grafana to achieve your desired results.
... View more
10-09-2023
06:27 AM
Could you kindly provide the DDL and a sample dataset to facilitate a more in-depth explanation?
... View more
10-04-2023
02:33 AM
1 Kudo
In Hive, there is no specific built-in data type that directly corresponds to the SQL Image data type for retaining the original binary image data from an SQL source. Hive primarily deals with structured data types like strings, numbers, and complex types such as arrays, maps, and structs. To store binary data like images in Hive, you typically use the BINARY data type or store them as STRING data, especially if you want to represent them in base64-encoded format. However, neither of these data types inherently retains the original binary value as is. You would need to handle the encoding and decoding of the binary data yourself. Here's an example of how you might store binary image data in Hive using the BINARY data type: CREATE TABLE image_data (
image_id INT,
image_content BINARY
); When you insert data into this table, you would need to encode the binary image data into a binary format suitable for storage in Hive. If retaining the original binary image data in its original format is crucial, you may want to consider other data storage solutions that are specifically designed for binary data, such as a distributed file system or binary data storage services. Hive, being primarily designed for structured data, may not be the best choice for this use case if you need to maintain the exact original binary data without encoding or modification.
... View more
10-03-2023
02:15 AM
It seems that the query involves dynamic partitioning, but the dynamic partition column is not included in either the select statement or the Common Table Expression (CTE). Please add the dynamic partition column 'date' to the select statement and validate it in Beeline.
... View more
10-03-2023
02:07 AM
It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html
... View more