- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Error with socket timeout in CDP hive 3.1.3 when loading large dataset
Created ‎02-16-2023 04:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below are the version details
CDP 7.1.8
CM : 7.8.1
HIVE : 3.1.3
We are trying to insert data into partitioned table using ORC file. It contains approx 50,000 rows
Using below command to load data
LOAD DATA INPATH '/user/test/r_14_5.orc' INTO TABLE va_offer_16;
we see that all the map and reducer task are completed withing 500s.
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 7 7 0 0 0 0
Reducer 2 ...... container SUCCEEDED 33 33 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 2601.60 s
----------------------------------------------------------------------------------------------
However further there is no progress for next 2000 sec before failure with below error.
ERRROR :
ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
INFO : Completed executing command(queryId=hive_20230216163408_de7cb993-2086-4011-8788-50f46ed6e7f3); Time taken: 2605.485 seconds
INFO : OK
Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out (state=08S01,code=40000)
Created ‎02-28-2023 07:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@dmharshit It will not be easy to tell you what could be causing this from this error message alone. Maybe if you could share the query EXPLAIN PLAN as well as HS2 logs, that will give us a better idea.
Else, just try the following config change in Hive metastore, and in my experience this should help:
CDP > Hive > Configuration > Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml
1#
property: hive.metastore.event.listener
Value:<leave it blank>
Note: do not add anyting in the Value field
2#
Property: hive.metastore.transactional.event.listeners
Value: < leave it blank>
Note: do not add anyting in the Value field
Created ‎04-20-2023 05:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the error could see the query failed in MoveTask. MoveTask can be loading the partitions as well since the load statement belongs to the partitioned table, Along with HS2 logs HMS logs for the corresponding time period gives a better idea to identify the root cause of the failure.
If it's just timeout issue, increase client socket timeout value.
