Support Questions

Find answers, ask questions, and share your expertise

Sparklyr job hang

avatar
New Member

We are running a sparkly job that runs queries on cloudera CDP hive cluster. The job sometimes stops before a dbwriteTable function, doing nothing and running indefinitely. The job doesn't always stop in the same point, but always during this trywrite function invocation, not catching any error:

trywrite = function(sc, new_name, df, log_obj, wait_sec = 600, max_wait = 3600) 
{
    start_time = Sys.time()
    while (difftime(Sys.time(), start_time, units = 'secs') <= max_wait) {
    print(paste0('Attempt to write table: ', new_name, ' - ', Sys.time()))
    # Connection is valid?
    if (!DBI::dbIsValid(sc)) {
        error(log_obj, paste0('Connection not valid during write table: ', new_name))
        stop(paste0('Failed to write table: ', new_name))
    }   
    tryCatch({
        print(paste0('Writing table: ', new_name))
        result = DBI::dbWriteTable(sc, new_name, df)
        print(paste0('Write completed table: ', new_name, ' - ', Sys.time()))
        return(result)
    }, error = function(e) {
        error(log_obj, paste0('Connection not valid during write table: ', new_name, ' - ', Sys.time()))
        print(paste0('Error message: ', e$message))
        print(paste0('Retrying in', wait_sec, ' seconds: ', Sys.time()))
        Sys.sleep(wait_sec)
    })
}
stop(paste0('Failed to write table before max time: ', new_name))
}    

 

2 REPLIES 2

avatar
Community Manager

@intersoldi Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our Spark experts @vafs @Bharati @jagadeesan  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

Hello @intersoldi

Thanks for being part of our community. 

This could be an issue related to some threads blocked. 
Do you see anything on the YARN application log? 

On the Spark Event Log, does it always hang on the same task? 

Another thing you can try is to get jstacks and review them to see if they are hanging on any specific point or step, for example, a connection or thread. 


Regards,
Andrés Fallas
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs-up button.