Hi Everyone, I am Emmanuel Katto Dubai, United Arab Emirates (UAE) It appears you're encountering an issue when attempting to run an INSERT INTO query on a Hive table with the Tez execution engine. You mentioned that the query works fine when inserting data into integer columns but fails when inserting into string columns. Additionally, the issue seems related to the Kryo serialization in the logs, where Hive is unable to deserialize the required classes. Here's a breakdown and a few suggestions to troubleshoot the issue:
Error Analysis:
-
Error Type:
- The main error seems to be related to a serialization issue
-
- This indicates that Hive, during execution, is attempting to deserialize an object that hasn't been registered with the Kryo serializer. This might be related to how the Tez engine handles serialization during the execution of the query.
-
YARN Logs:
- The error trace shows a failure at the map phase (Map 1), caused by ROOT_INPUT_INIT_FAILURE. This is typically due to an issue with data initialization or deserialization, which is happening during the execution of the query.
-
Working with INSERT INTO:
- You noted that inserting into integer columns works fine, while inserting into string columns fails. This discrepancy suggests that there might be a data type or serialization mismatch between what Hive expects and what Tez can handle when dealing with strings.
-
LOAD DATA Command Works:
- Since loading data using the LOAD DATA command works fine, it suggests that your table and data are properly set up and that the issue may lie with the execution engine or how data is being handled by Tez in this particular insert query.
Suggestions for Resolution:
-
Check Hive and Tez Compatibility: Ensure that your Hive and Tez versions are compatible. Sometimes, there are serialization issues caused by mismatches between different versions of Hive, Tez, and Hadoop.
-
Serialization Configuration: The error message points to issues with the Kryo serializer. You can try configuring Hive to use a different serializer. Add the following configurations to your hive-site.xml or at the session level:
This switches the execution engine to MapReduce instead of Tez, which might help identify if Tez is indeed the cause of the problem.
-
Insert Values Syntax: Double-check the syntax and the data being inserted. While you mentioned that inserting integers works, ensure that your string values are properly quoted and the column types in the table match the types you're trying to insert.
Example of a valid insert:
Recheck Table Definitions: Ensure the table's column types are defined correctly as strings. You can check the table structure with:
-
If the columns are not defined as STRING, this could lead to issues with serialization.
-
Tez Configuration and Debugging: Sometimes, Tez-specific issues can be resolved by tweaking the configuration in tez-site.xml. You can increase the logging level for Tez and Hive to capture more detailed logs about what's failing during the execution. This might give you more insights into what's going wrong.
-
Cluster Resource Availability: The error might also relate to resource allocation on YARN or the configuration of the Tez AM (Application Master). Check if your cluster has enough resources allocated, and ensure that Tez is properly configured to handle the workload.
Testing the Query:
-
Simple Insert with Strings (Test): To isolate the issue, try inserting data with simpler values (single-letter strings) into the table:
This can help rule out potential issues with the data values you're using in the original query.
-
Switch Execution Engine Temporarily: If you're unable to resolve the issue, consider temporarily switching the execution engine to mr (MapReduce) instead of tez to see if the issue persists:
SET hive.execution.engine=mr; INSERT INTO test VALUES ('a', 'b');
-
Hive and Tez Logs: The Tez logs provide important details on why the query fails. Examine the Tez application logs in the YARN ResourceManager UI to identify if there are any specific issues during the map or reduce phase that could explain the failure.
By following these steps and troubleshooting the configuration, you should be able to pinpoint the issue and fix it.
Regards
Emmanuel Katto