Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to run independent spark sql queries in parallel (multiple insert scripts in my case)

How to run independent spark sql queries in parallel (multiple insert scripts in my case)

New Contributor

We are doing spark programming in java language. The ‘DataFrame’ has been stored in temporary table and we are running multiple queries from this temporary table inside loop. The quires are running in sequential order. We need to run in parallel from temporary table. Please find code snippet below.

 

Thanks in advance for your cooperation.

 

HiveContext hiveContext = new HiveContext(sparkContext);

              String mainQueryHql = getFileContent(mainQueryFilePath);

              DataFrame df = hiveContext.sql(mainQueryHql).persist(StorageLevel.MEMORY_AND_DISK_SER());

              df.show();

              System.out.println("Total Records in Main Query " + df.count());

             

              df.registerTempTable(tempTable);

              ArrayList<DataFrame> dataFrameList = new ArrayList();

             

              DataFrame dfSubQuery = null;

              String subQuires = getFileContent(subQueryFilePath);

             

              String[] alQuires = subQuires.split(";");

             

              for(int i=0; i<alQuires.length; i++) {

                     System.out.println("Query no " + i +" is : " + alQuires[i]);

                     logger.debug("Query no " + i +" is : " + alQuires[i]);

                     dfSubQuery = hiveContext.sql(alQuires[i]);                   

                     dfSubQuery.show();

                     dataFrameList.add(dfSubQuery);

              }

1 REPLY 1
Highlighted

Re: How to run independent spark sql queries in parallel (multiple insert scripts in my case)

Expert Contributor

You'll want to use the concurrency features within the JVM.  Please have a look at the Futures class: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html