Member since
11-10-2024
6
Posts
6
Kudos Received
0
Solutions
11-22-2024
04:05 AM
1 Kudo
Hi Everyone, I am Emmanuel Katto Dubai, United Arab Emirates (UAE) It appears you're encountering an issue when attempting to run an INSERT INTO query on a Hive table with the Tez execution engine. You mentioned that the query works fine when inserting data into integer columns but fails when inserting into string columns. Additionally, the issue seems related to the Kryo serialization in the logs, where Hive is unable to deserialize the required classes. Here's a breakdown and a few suggestions to troubleshoot the issue:
Error Analysis:
Error Type:
The main error seems to be related to a serialization issue
KryoException: Encountered unregistered class ID: 112
This indicates that Hive, during execution, is attempting to deserialize an object that hasn't been registered with the Kryo serializer. This might be related to how the Tez engine handles serialization during the execution of the query.
YARN Logs:
The error trace shows a failure at the map phase (Map 1), caused by ROOT_INPUT_INIT_FAILURE. This is typically due to an issue with data initialization or deserialization, which is happening during the execution of the query.
Working with INSERT INTO:
You noted that inserting into integer columns works fine, while inserting into string columns fails. This discrepancy suggests that there might be a data type or serialization mismatch between what Hive expects and what Tez can handle when dealing with strings.
LOAD DATA Command Works:
Since loading data using the LOAD DATA command works fine, it suggests that your table and data are properly set up and that the issue may lie with the execution engine or how data is being handled by Tez in this particular insert query.
Suggestions for Resolution:
Check Hive and Tez Compatibility: Ensure that your Hive and Tez versions are compatible. Sometimes, there are serialization issues caused by mismatches between different versions of Hive, Tez, and Hadoop.
Serialization Configuration: The error message points to issues with the Kryo serializer. You can try configuring Hive to use a different serializer. Add the following configurations to your hive-site.xml or at the session level:
xml
Copy code
<property> <name>hive.tez.container.size</name> <value>1024</value> </property> <property> <name>hive.execution.engine</name> <value>mr</value> </property>
This switches the execution engine to MapReduce instead of Tez, which might help identify if Tez is indeed the cause of the problem.
Insert Values Syntax: Double-check the syntax and the data being inserted. While you mentioned that inserting integers works, ensure that your string values are properly quoted and the column types in the table match the types you're trying to insert.
Example of a valid insert:
INSERT INTO TABLE test VALUES ('a', 'b');
Recheck Table Definitions: Ensure the table's column types are defined correctly as strings. You can check the table structure with:
DESCRIBE test;
If the columns are not defined as STRING, this could lead to issues with serialization.
Tez Configuration and Debugging: Sometimes, Tez-specific issues can be resolved by tweaking the configuration in tez-site.xml. You can increase the logging level for Tez and Hive to capture more detailed logs about what's failing during the execution. This might give you more insights into what's going wrong.
Cluster Resource Availability: The error might also relate to resource allocation on YARN or the configuration of the Tez AM (Application Master). Check if your cluster has enough resources allocated, and ensure that Tez is properly configured to handle the workload.
Testing the Query:
Simple Insert with Strings (Test): To isolate the issue, try inserting data with simpler values (single-letter strings) into the table:
sql
Copy code
INSERT INTO test VALUES ('x', 'y');
This can help rule out potential issues with the data values you're using in the original query.
Switch Execution Engine Temporarily: If you're unable to resolve the issue, consider temporarily switching the execution engine to mr (MapReduce) instead of tez to see if the issue persists:
SET hive.execution.engine=mr; INSERT INTO test VALUES ('a', 'b');
Hive and Tez Logs: The Tez logs provide important details on why the query fails. Examine the Tez application logs in the YARN ResourceManager UI to identify if there are any specific issues during the map or reduce phase that could explain the failure.
By following these steps and troubleshooting the configuration, you should be able to pinpoint the issue and fix it.
Regards
Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache Tez
11-17-2024
11:14 PM
1 Kudo
Hi Everyone, I am Emmanuel Katto from Dubai, United Arab Emirates (UAE) We encountered an issue on our production Kudu cluster where the tablet server failed due to a disk failure, and the WAL catalog was lost. After installing a new disk and clearing the data directory following the Kudu documentation (Rebuilding Kudu), we restarted the failing tablet server. However, after restarting, we noticed that the kudu ksck command showed two tablet servers with different UUIDs for the same server, and one of them had a "WRONG SERVER_UUID" status.
Questions:
What could be the cause of this error?
How can we avoid this issue in the future?
Is there a way to resolve this problem without restarting the master server?
We also found the kudu tserver unregister command, which appears to be used for removing tablet servers with incorrect UUIDs, but we didn't find this mentioned in the official documentation.
Regards
Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache Flink
11-13-2024
09:56 PM
1 Kudo
Hey everyone,
I’m encountering a "401 Unauthorized" error while configuring the SiteToSite HTTPS Provenance Reporting Task in NiFi. I’ve double-checked the credentials and the configuration, but it still seems to be giving me this error.
Has anyone else run into this issue or have suggestions on what might be causing it? Any guidance or troubleshooting tips would be much appreciated!
Looking forward to your insights!
Regards
Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache Ambari
11-13-2024
03:32 AM
1 Kudo
Hi team,
I'm trying to set up AES decryption in Apache NiFi using the DecryptContent processor for an encryption process based on AES-128 CTR mode. I've successfully implemented AES decryption locally with Node.js, but I’m running into some trouble replicating it in NiFi.
Here are the details of the encryption setup:
Encrypted Text: c6 c7 4b 49 0d cf 5c 20 87 0a e0 cd c4 a7 bf 94 d8
Key: 3E 9B 26 FE 46 4F 6D 2D 2F 69 5D 87 8A 07 93 74
IV: 2d 2c 83 42 00 74 1b 16 20 c0 7d 13 20 00 00 00
Correct Result: 14 25 79 ed a8 ff a7 00 00 e5 03 00 00 be 03 00 00
I've confirmed that my key and IV are correct. I’m using AES-128, CTR mode, and NoPadding for the encryption. The issue arises when I try to decrypt using NiFi’s DecryptContent processor. Here's what I've tried so far:
Cipher Algorithm Mode: Set to CTR
Cipher Algorithm Padding: Set to NoPadding
Key Specification Format: Set to RAW
For the incoming FlowFile content, I've set it as:
c6c74b490dcf5c20870ae0cdc4a7bf94d84E69466949562d2c834200741b1620c07d1320000000
(I also experimented with adding 4E6946694956 as the NiFi IV delimiter.)
Despite these settings, I get the following error: "Wrong IV length: must be 16 bytes long"
It seems like NiFi is interpreting the data as a regular string rather than HEX, which may be the source of the issue.
I have appreciate any suggestions or insights from the team:
Is there a specific way to input HEX data into NiFi to ensure the IV and content are correctly processed?
Should I be formatting the data differently, or is there a setting in the DecryptContent processor I might have missed?
Are there any additional configuration steps or pitfalls I should be aware of when dealing with AES decryption in CTR mode within NiFi?
Thanks in advance for your help!
Best regards, Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache NiFi
11-11-2024
08:24 PM
1 Kudo
Hi everyone, I'm Emmanuel Katto from Dubai, United Arab Emirates (UAE) working on decrypting data using AES-128 in CTR mode in Apache NiFi, and I could really use some help or suggestions on how to configure it correctly. Here's what I've done so far:
Local Setup (Node.js)
Text to Decrypt: c6 c7 4b 49 0d cf 5c 20 87 0a e0 cd c4 a7 bf 94 d8
Key: 3E 9B 26 FE 46 4F 6D 2D 2F 69 5D 87 8A 07 93 74
Initialization Vector (IV): 2d 2c 83 42 00 74 1b 16 20 c0 7d 13 20 00 00 00
Correct Decryption Result: 14 25 79 ed a8 ff a7 00 00 e5 03 00 00 be 03 00 00
In my Node.js setup, this works perfectly, and I can decrypt the content using AES-128 CTR with no padding.
NiFi Setup (DecryptContent Processor)
I am trying to achieve the same decryption in Apache NiFi using the DecryptContent processor. I’ve configured it as follows:
Cipher Algorithm Mode: CTR
Cipher Algorithm Padding: NoPadding
Key Specification Format: RAW
For the incoming FlowFile content, I’ve set it to:
c6c74b490dcf5c20870ae0cdc4a7bf94d84E69466949562d2c834200741b1620c07d1320000000
However, I get an error: "Wrong IV length: must be 16 bytes long". This error suggests that NiFi is interpreting the content as a normal string and not as HEX values.
My Questions:
How do I correctly provide the IV and encrypted content in HEX format to the DecryptContent processor?
Is there any configuration I’ve missed to specify the content as HEX?
Is the IV delimiter (4E6946694956) necessary in this case, or should I be providing the IV as part of the content differently?
Would appreciate any guidance or suggestions from anyone who has worked with AES decryption in NiFi using CTR mode. Thanks in advance!
Regards
Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache NiFi
11-10-2024
11:21 PM
1 Kudo
Hello everyone,
I am Emmanuel Katto currently working on evaluating the disk I/O of our CDH (Cloudera Distribution for Hadoop) cluster, which consists of several hundred bare metal machines. I would like to obtain the following values for each application within a certain period of time:
total_io_mb
mapreduce_inputBytes
mapreduce_outputBytes
These values, I believe, are logged in the YARN logs, but I’m not sure how to configure YARN or the logging system to ensure these values are written in the log files.
So far, through Cloudera Manager, we’ve only been able to get metrics like the yarn_application_hdfs_bytes_read_rate, but that’s not enough for evaluating overall disk I/O.
Could anyone share any advice or alternatives on how to extract these specific I/O values for each application? Also, if there’s a way to configure YARN or Cloudera Manager to write these metrics into the logs, I’d appreciate your insights.
Thanks in advance!
Regards
Emmanuel Katto
... View more
Labels:
- Labels:
-
Apache YARN