Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this topic

How to run spark sql in parallel?

avatar

We are doing spark programming in java language. The ‘DataFrame’ has been stored in temporary table and we are running multiple queries from this temporary table inside loop. The quires are running in sequential order. We need to run in parallel from temporary table. Please find code snippet below.

 

Thanks in advance for your cooperation.

 

HiveContext hiveContext = new HiveContext(sparkContext);

              String mainQueryHql = getFileContent(mainQueryFilePath);

              DataFrame df = hiveContext.sql(mainQueryHql).persist(StorageLevel.MEMORY_AND_DISK_SER());

              df.show();

              System.out.println("Total Records in Main Query " + df.count());

             

              df.registerTempTable(tempTable);

              ArrayList<DataFrame> dataFrameList = new ArrayList();

             

              DataFrame dfSubQuery = null;

              String subQuires = getFileContent(subQueryFilePath);

             

              String[] alQuires = subQuires.split(";");

             

              for(int i=0; i<alQuires.length; i++) {

                     System.out.println("Query no " + i +" is : " + alQuires[i]);

                     logger.debug("Query no " + i +" is : " + alQuires[i]);

                     dfSubQuery = hiveContext.sql(alQuires[i]);                   

                     dfSubQuery.show();

                     dataFrameList.add(dfSubQuery);

              }

Who agreed with this topic