Member since 
    
	
		
		
		09-16-2021
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                421
            
            
                Posts
            
        
                55
            
            
                Kudos Received
            
        
                39
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 226 | 10-22-2025 05:48 AM | |
| 309 | 09-05-2025 07:19 AM | |
| 791 | 07-15-2025 02:22 AM | |
| 1344 | 06-02-2025 06:55 AM | |
| 1617 | 05-22-2025 03:00 AM | 
			
    
	
		
		
		10-10-2023
	
		
		10:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please share the complete error stack-trace.     With respect to The table doesn't have partitions.    Make sure HDFS and metadata in sync.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2023
	
		
		10:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Grafana is a popular open-source platform for monitoring and observability, and it is commonly associated with telemetry data visualization, especially when integrated with time-series databases like Prometheus, InfluxDB, or Elasticsearch. However, Grafana is not limited to telemetry data visualization, and it can be used for a wide range of data sources, including HDFS and Hive tables.  Here are some options for using Grafana for data visualization beyond telemetry:    Hive Data Sources: Grafana has built-in support for various data sources, and it offers plugins for connecting to databases and data lakes. You can configure Grafana to connect to Hive as a data source and visualize data stored in Hive tables.    HDFS Data Sources: While Grafana primarily focuses on time-series data, you can still use it to visualize data stored in HDFS by connecting it to Hadoop-related data sources or by exporting HDFS data to another data store (e.g., Elasticsearch, InfluxDB) that Grafana supports.    SQL Databases: Grafana can connect to traditional relational databases using SQL data sources. If you have data stored in SQL databases, you can use Grafana to create dashboards and visualizations.    Log Data: Grafana can be used for log data analysis and visualization. You can integrate it with tools like Loki (for log aggregation) and explore log data in dashboards.    Custom Plugins: If you have a unique data source or a specific format, you can develop custom data source plugins for Grafana to connect to your data and visualize it as needed.    API Data: Grafana supports various data sources that expose data through APIs. You can connect to REST APIs, GraphQL APIs, and other web services to visualize data.    Mixed Data Sources: Grafana allows you to create dashboards that combine data from multiple sources, making it versatile for various data visualization needs.    While Grafana is flexible and can be used for a wide range of data sources, it's important to consider the nature of your data and the specific visualization requirements. Depending on your use case, you may need to choose the most suitable data source, data format, and visualization options within Grafana to achieve your desired results. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2023
	
		
		09:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 When users query a Hive table partitioned on a specific column (in your case, "source system name") but do not include a filter condition on that partition column in their queries, Hive may need to scan all partitions of the table to retrieve the relevant data. This can lead to less efficient query performance, as it requires reading unnecessary data from multiple partitions.    In your scenario, where you perform frequent insert overwrites to keep only the current data, the table may not grow drastically in terms of total data volume. However, if the users frequently query the table without specifying the partition column condition, it can still result in increased query processing time and resource utilisation.  To improve query efficiency in this situation, you have a few options:    Partition Pruning: Encourage users to include the partition column condition in their queries. Hive has built-in partition pruning optimization, which allows it to skip unnecessary partitions when the partition column condition is provided.    Materialized Views: If certain common query patterns exist, consider creating materialized views that pre-aggregate or pre-filter data based on those patterns. This can significantly speed up queries that align with the materialized views.    Optimize Data Layout: Ensure that the data is stored efficiently, and consider using columnar storage formats like ORC or Parquet, which can improve query performance.      Ultimately, the choice of optimization strategy depends on the specific usage patterns and requirements of your users. It's essential to monitor query performance and understand your users' query behavior to determine which optimization approaches are most effective.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-09-2023
	
		
		06:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Could you kindly provide the DDL and a sample dataset to facilitate a more in-depth explanation? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2023
	
		
		02:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @yucai Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  Thanks.@ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2023
	
		
		02:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It seems that the query involves dynamic partitioning, but the dynamic partition column is not included in either the select statement or the Common Table Expression (CTE).  Please add the dynamic partition column 'date' to the select statement and validate it in Beeline. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2023
	
		
		02:07 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It appears that you're currently following a two-step process: writing data to a Parquet table and then using that Parquet table to write to an ORC table. You can streamline this by directly writing the data into the ORC format table, eliminating the need to write the same data to a Parquet table before reading it.     Ref -   https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html    https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/developing-spark-applications/topics/spark-loading-orc-data-predicate-push-down.html    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2023
	
		
		01:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You can use the following HQL (SQL) query to find the sum of new cases for each continent and identify the country with the highest number of cases in each continent along with the corresponding case count:            WITH ContinentSum AS (
  SELECT
    continent,
    SUM(new_cases) AS total_new_cases
  FROM
    sample_table_test
  GROUP BY
    continent
),
CountryMaxCases AS (
  SELECT
    continent,
    location AS country,
    MAX(total_case) AS max_cases
  FROM
    sample_table_test
  GROUP BY
    continent, location
)
SELECT
  cs.continent,
  cs.total_new_cases,
  cm.country,
  cm.max_cases
FROM
  ContinentSum cs
JOIN
  CountryMaxCases cm
ON
  cs.continent = cm.continent
  AND cs.total_new_cases = cm.max_cases;        This query first calculates the sum of new cases for each continent in the ContinentSum CTE (Common Table Expression). Then, it finds the country with the highest total cases in each continent using the CountryMaxCases CTE. Finally, it joins the results from both CTEs to provide the desired output.      Sample resultset for the shared data.            +---------------+---------------------+--------------+---------------+
| cs.continent  | cs.total_new_cases  |  cm.country  | cm.max_cases  |
+---------------+---------------------+--------------+---------------+
| Asia          | 25.0                | Afghanistan  | 25.0          |
+---------------+---------------------+--------------+---------------+             
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-30-2023
	
		
		11:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 To pinpoint the root cause, kindly provide a few samples of data 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-29-2023
	
		
		05:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 To gain a better understanding of the issue, kindly provide the HS2 jstacks at 30-second intervals until the query completes 
						
					
					... View more