- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to run spark sql in parallel?
- Labels:
-
Apache Spark
Created on 10-14-2016 07:25 AM - edited 09-16-2022 03:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are doing spark programming in java language. The ‘DataFrame’ has been stored in temporary table and we are running multiple queries from this temporary table inside loop. The quires are running in sequential order. We need to run in parallel from temporary table. Please find code snippet below.
Thanks in advance for your cooperation.
HiveContext hiveContext = new HiveContext(sparkContext);
String mainQueryHql = getFileContent(mainQueryFilePath);
DataFrame df = hiveContext.sql(mainQueryHql).persist(StorageLevel.MEMORY_AND_DISK_SER());
df.show();
System.out.println("Total Records in Main Query " + df.count());
df.registerTempTable(tempTable);
ArrayList<DataFrame> dataFrameList = new ArrayList();
DataFrame dfSubQuery = null;
String subQuires = getFileContent(subQueryFilePath);
String[] alQuires = subQuires.split(";");
for(int i=0; i<alQuires.length; i++) {
System.out.println("Query no " + i +" is : " + alQuires[i]);
logger.debug("Query no " + i +" is : " + alQuires[i]);
dfSubQuery = hiveContext.sql(alQuires[i]);
dfSubQuery.show();
dataFrameList.add(dfSubQuery);
}
Created 10-22-2016 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is nothing native within Spark to handle running queries in parallel. Instead you can take a look at Java concurrency and in particular Futures[1] which will allow you to start queries in parallel and check status later.
1. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
Created 10-22-2016 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is nothing native within Spark to handle running queries in parallel. Instead you can take a look at Java concurrency and in particular Futures[1] which will allow you to start queries in parallel and check status later.
1. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
Created 10-24-2016 03:10 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 11-22-2016 10:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamalakanta,
I am also in need of a solution for this problem.
Have you checked with Java Concurrency and Futures ? or else do u got any other solution?
Please share the solution.
I am also in need of executing the sparksql in parallel(similar to your program).
Thanks in advance
Created 11-23-2016 03:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please use Future interface in your code. Request to explorebelow mentioned object or interfaces.
java.util.concurrent.ExecutionException;
java.util.concurrent.ExecutorService;
java.util.concurrent.Executors;
java.util.concurrent.Future;
You can get more information from below mentioned URLs.
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
Created 11-23-2016 03:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created on
11-23-2016
07:38 AM
- last edited on
11-28-2016
04:54 AM
by
cjervis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It will be of great help to me.
Thanks a lot Kamalakanta 🙂
I have some more doubts regarding running spark SQL queries in parallel.
