About ggangadharan

RangaReddy · ‎10-24-2023

I think you don't have sufficient resources to run the job for queue root.hdfs. Verify is there any pending running jobs/application in the root.hdfs queue from Resource Manager UI. If it is running kill those if it is not required. And also verify from spark side you have given less resource to test it.

VidyaSargur · ‎10-16-2023

@DaveNepal, Thank you for your participation in the Cloudera Community. I'm happy to see you resolved your issue. Could you please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?

ggangadharan · ‎10-10-2023

In Hive, you can achieve a similar result as the UNPIVOT operation in SQL Server by using the LATERAL VIEW and lateral VIEW OUTER explode functions to split the columns into rows. Here's how you can convert your SQL Server query to Hive: SELECT x, check AS y, split AS z FROM dbo.tbl1 LATERAL VIEW OUTER explode(array(1, y2, y3, y4, y5, y6, y7, y8, y9, y10)) tbl AS split; In this Hive query: LATERAL VIEW OUTER explode is used to split the values from columns y2 to y10 into separate rows. The AS clause assigns aliases to the columns, where split corresponds to the values from the UNPIVOTed columns, check corresponds to the column name (y), and x remains unchanged. This query will produce a result similar to the UNPIVOT operation in SQL Server, where the values from columns y2 to y10 are split into separate rows along with their corresponding x and y values. In Hive, you can achieve a similar result as the PIVOT operation in SQL Server by using conditional aggregation along with CASE statements. Here's how you can convert your SQL Server query to Hive: SELECT * FROM ( SELECT a, b, c, cbn_TYPE FROM tbl2 ) SRC LEFT JOIN ( SELECT a, SUM(CASE WHEN cbn_TYPE = 'ONE TQ FOUR' THEN TOTAL_AMOUNT ELSE 0 END) AS ONE_TQ_FOUR, SUM(CASE WHEN cbn_TYPE = 'going loss' THEN TOTAL_AMOUNT ELSE 0 END) AS going_loss, SUM(CASE WHEN cbn_TYPE = 'COSTS LEAVING team sales' THEN TOTAL_AMOUNT ELSE 0 END) AS COSTS_LEAVING_team_sales, SUM(CASE WHEN cbn_TYPE = 'profit' THEN TOTAL_AMOUNT ELSE 0 END) AS profit, SUM(CASE WHEN cbn_TYPE = 'check money' THEN TOTAL_AMOUNT ELSE 0 END) AS check_money FROM tbl2 GROUP BY a ) PIV ON SRC.a = PIV.a; In this Hive query: We first create an intermediate result set (PIV) that calculates the sums for each cbn_TYPE using conditional aggregation (SUM with CASE statements). The LEFT JOIN is used to combine the original source table (SRC) with the aggregated result (PIV) based on the common column a. The result will have columns a, b, c, and the pivoted columns ONE_TQ_FOUR, going_loss, COSTS_LEAVING_team_sales, profit, and check_money, similar to the PIVOT operation in SQL Server. This query essentially performs a manual pivot operation in Hive by using conditional aggregation to calculate the sums for each cbn_TYPE and then joining the results back to the original table. In Hive, you can use the CASE statement to achieve the same result as the SQL Server expression NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0). Here's the equivalent Hive query: SELECT CASE WHEN (abc.Tc IS NULL AND abc.YR IS NULL) OR (abc.Tc + abc.YR = 0) THEN NULL ELSE abc.Tc + abc.YR END AS result FROM your_table AS abc; In this Hive query: We use the CASE statement to conditionally calculate the result. If both abc.Tc and abc.YR are NULL, or if their sum is equal to 0, we return NULL. Otherwise, we return the sum of abc.Tc and abc.YR. This query replicates the behavior of the NULLIF(ISNULL(abc.Tc, 0) + ISNULL(abc.YR, 0), 0) expression in SQL Server, providing a Hive-compatible solution for achieving the same result.

ggangadharan · ‎10-10-2023

Please share the complete stack-trace to get better context. To perform an INSERT OVERWRITE operation on a Hive ACID transactional table, you need to ensure that you have the right configuration and execute the query correctly. Here are the steps and configurations: Enable ACID Transactions: Make sure your table is created with ACID properties. You can specify it during table creation like this: CREATE TABLE my_table ( -- Your table schema here ) STORED AS ORC TBLPROPERTIES ('transactional'='true'); If your table is not already transactional, you may need to create a new transactional table with the desired schema. Set Hive ACID Properties: You should set some Hive configuration properties to enable ACID transactions if they are not already set: SET hive.support.concurrency=true; SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; SET hive.compactor.initiator.on=true; SET hive.compactor.worker.threads=1; -- Number of compactor threads depending on the number of managed tables and usgae Perform the INSERT OVERWRITE: Use the INSERT OVERWRITE statement to replace the data in the table: INSERT OVERWRITE TABLE my_table SELECT ... FROM ... Ensure that the SELECT statement fetches the data you want to overwrite with. You can use a WHERE clause or other filters to specify the data you want to replace. Enable Auto-Compaction : You can enable auto-compaction to periodically clean up small files created by ACID transactions. REF - https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/ch02s05s01.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_acid_operations.html https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/managing-hive/content/hive_hive_data_compaction.html

ggangadharan · ‎10-10-2023

Basic spark-submit command with respect to HWC - JDBC_CLUSTER mode pyspark --master yarn --jars /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.8.0-801.jar --py-files /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/pyspark_hwc-1.0.0.7.1.8.0-801.zip --conf spark.sql.hive.hiveserver2.jdbc.url='jdbc:hive2://c3757-node2.coelab.cloudera.com:2181,c3757-node3.coelab.cloudera.com:2181,c3757-node4.coelab.cloudera.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2' --conf spark.datasource.hive.warehouse.read.mode='JDBC_CLUSTER' --conf spark.datasource.hive.warehouse.load.staging.dir='/tmp' --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator To append data to an existing Hive ACID table, ensure that you specify the save mode as 'append'. Example Using Python version 2.7.5 (default, Jun 28 2022 15:30:04) SparkSession available as 'spark'. >>> from pyspark_llap import HiveWarehouseSession >>> hive = HiveWarehouseSession.session(spark).build() >>> df=hive.sql("select * from spark_hwc.employee") 23/10/10 17:20:00 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist 23/10/10 17:20:08 INFO rule.HWCSwitchRule: Registering Listeners >>> df.write.mode("append").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save() >>> >>> >>> hive.sql("select count(*) from spark_hwc.employee_new").show() 23/10/10 17:22:04 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist +---+ |_c0| +---+ | 5| +---+ >>> To overwrite data to an existing Hive ACID table, ensure that you specify the save mode as 'overwrite'. Example >>> df.write.mode("overwrite").format(HiveWarehouseSession().HIVE_WAREHOUSE_CONNECTOR).option("table", "spark_hwc.employee_new").save() >>> To append or overwrite a new Hive ACID table, there's no need to specify the saveMode explicitly. The HWC will automatically create the new ACID table based on its structure and internally trigger the LOAD DATA INPATH command Ref - https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/integrating-hive-and-bi/topics/hive-read-write-operations.html

ggangadharan · ‎10-10-2023

Grafana is a popular open-source platform for monitoring and observability, and it is commonly associated with telemetry data visualization, especially when integrated with time-series databases like Prometheus, InfluxDB, or Elasticsearch. However, Grafana is not limited to telemetry data visualization, and it can be used for a wide range of data sources, including HDFS and Hive tables. Here are some options for using Grafana for data visualization beyond telemetry: Hive Data Sources: Grafana has built-in support for various data sources, and it offers plugins for connecting to databases and data lakes. You can configure Grafana to connect to Hive as a data source and visualize data stored in Hive tables. HDFS Data Sources: While Grafana primarily focuses on time-series data, you can still use it to visualize data stored in HDFS by connecting it to Hadoop-related data sources or by exporting HDFS data to another data store (e.g., Elasticsearch, InfluxDB) that Grafana supports. SQL Databases: Grafana can connect to traditional relational databases using SQL data sources. If you have data stored in SQL databases, you can use Grafana to create dashboards and visualizations. Log Data: Grafana can be used for log data analysis and visualization. You can integrate it with tools like Loki (for log aggregation) and explore log data in dashboards. Custom Plugins: If you have a unique data source or a specific format, you can develop custom data source plugins for Grafana to connect to your data and visualize it as needed. API Data: Grafana supports various data sources that expose data through APIs. You can connect to REST APIs, GraphQL APIs, and other web services to visualize data. Mixed Data Sources: Grafana allows you to create dashboards that combine data from multiple sources, making it versatile for various data visualization needs. While Grafana is flexible and can be used for a wide range of data sources, it's important to consider the nature of your data and the specific visualization requirements. Depending on your use case, you may need to choose the most suitable data source, data format, and visualization options within Grafana to achieve your desired results.

ggangadharan · ‎10-09-2023

Could you kindly provide the DDL and a sample dataset to facilitate a more in-depth explanation?

DianaTorres · ‎10-03-2023

@yucai Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.@

ggangadharan · ‎10-03-2023

It seems that the query involves dynamic partitioning, but the dynamic partition column is not included in either the select statement or the Common Table Expression (CTE). Please add the dynamic partition column 'date' to the select statement and validate it in Beeline.

ggangadharan · ‎10-03-2023

It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it. Ref - https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html

Online	Offline
Last Visited	‎05-10-2026 10:55 PM

Member Since	‎09-16-2021 02:45 AM
Last Visited	‎05-10-2026 10:55 PM
Posts	423
Kudos received	55

Cloudera Community

Re: HWC on CDP 7.3.1 with Spark 3.5

Re: Using Hadoop Iceberg catalog with Hive engine ...

Re: Where can I find the Maven repository for HDP ...

Re: Hive on TEZ memory footprint and Impala stats...

Re: Hive little query slow

Re: Spark application stuck at ACCEPTED state at s...

Re: SQOOP import of "image" data type into hive

Re: SQL to Hive Query Conversion

Re: [How?] Hive INSERT OVERWRITE to Transaction Ta...

Re: Unable to append overwrite Hive ACID table usi...

Re: data monitoring dashboard from hdfs and hive

Re: Why doesn't hive bucketing work at partition l...

Re: Hive on tez cannot execute custom hook program...

Re: Ozzie Workflow for CTE and Insert Statement

Re: Insert data from spark data frame to hive orc ...