Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error with socket timeout in CDP hive 3.1.3 when loading large dataset

avatar
Contributor

Below are the version details
CDP 7.1.8
CM : 7.8.1
HIVE : 3.1.3


We are trying to insert data into partitioned table using ORC file. It contains approx 50,000 rows
Using below command to load data
LOAD DATA INPATH '/user/test/r_14_5.orc' INTO TABLE va_offer_16;

we see that all the map and reducer task are completed withing 500s.


----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 7 7 0 0 0 0
Reducer 2 ...... container SUCCEEDED 33 33 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 2601.60 s
----------------------------------------------------------------------------------------------

However further there is no progress for next 2000 sec before failure with below error.


ERRROR :

ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
INFO : Completed executing command(queryId=hive_20230216163408_de7cb993-2086-4011-8788-50f46ed6e7f3); Time taken: 2605.485 seconds
INFO : OK
Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out (state=08S01,code=40000)

2 REPLIES 2

avatar
Master Collaborator

@dmharshit It will not be easy to tell you what could be causing this from this error message alone. Maybe if you could share the query EXPLAIN PLAN as well as HS2 logs, that will give us a better idea.

Else, just try the following config change in Hive metastore, and in my experience this should help:

CDP > Hive > Configuration > Hive Metastore Server Advanced Configuration Snippet (Safety Valve) for hive-site.xml

1# 
property:  hive.metastore.event.listener
Value:<leave it blank>
Note: do not add anyting in the Value field

2#
Property: hive.metastore.transactional.event.listeners
Value: < leave it blank>
Note: do not add anyting in the Value field

avatar
Expert Contributor

From the error could see the query failed in MoveTask. MoveTask can be loading the partitions as well since the load statement belongs to the partitioned table, Along with HS2 logs HMS logs for the corresponding time period gives a better idea to identify the root cause of the failure. 

If it's just timeout issue,  increase client socket timeout value.