Support Questions

Kamalakanta · ‎10-14-2016

We are doing spark programming in java language. The ‘DataFrame’ has been stored in temporary table and we are running multiple queries from this temporary table inside loop. The quires are running in sequential order. We need to run in parallel from temporary table. Please find code snippet below.

Thanks in advance for your cooperation.

HiveContext hiveContext = new HiveContext(sparkContext);

String mainQueryHql = getFileContent(mainQueryFilePath);

DataFrame df = hiveContext.sql(mainQueryHql).persist(StorageLevel.MEMORY_AND_DISK_SER());

df.show();

System.out.println("Total Records in Main Query " + df.count());

df.registerTempTable(tempTable);

ArrayList<DataFrame> dataFrameList = new ArrayList();

DataFrame dfSubQuery = null;

String subQuires = getFileContent(subQueryFilePath);

String[] alQuires = subQuires.split(";");

for(int i=0; i<alQuires.length; i++) {

System.out.println("Query no " + i +" is : " + alQuires[i]);

logger.debug("Query no " + i +" is : " + alQuires[i]);

dfSubQuery = hiveContext.sql(alQuires[i]);

dfSubQuery.show();

dataFrameList.add(dfSubQuery);

}

hubbarja · ‎10-22-2016

There is nothing native within Spark to handle running queries in parallel. Instead you can take a look at Java concurrency and in particular Futures[1] which will allow you to start queries in parallel and check status later.

1. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html

View solution in original post

hubbarja · ‎10-22-2016

There is nothing native within Spark to handle running queries in parallel. Instead you can take a look at Java concurrency and in particular Futures[1] which will allow you to start queries in parallel and check status later.

1. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html

Kamalakanta · ‎10-24-2016

Thanks Hubbarja, Will check and respond very shortly.

HadoopSiva · ‎11-22-2016

Hi Kamalakanta,

I am also in need of a solution for this problem.

Have you checked with Java Concurrency and Futures ? or else do u got any other solution?

Please share the solution.

I am also in need of executing the sparksql in parallel(similar to your program).

Thanks in advance

Kamalakanta · ‎11-23-2016

Please use Future interface in your code. Request to explorebelow mentioned object or interfaces.

java.util.concurrent.ExecutionException;
java.util.concurrent.ExecutorService;
java.util.concurrent.Executors;
java.util.concurrent.Future;

You can get more information from below mentioned URLs.

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html

Kamalakanta · ‎11-23-2016

One more helpfull link

http://stackoverflow.com/questions/28712420/how-to-run-concurrent-jobsactions-in-apache-spark-using-...

cjervis · ‎11-23-2016

It will be of great help to me.
Thanks a lot Kamalakanta 🙂

I have some more doubts regarding running spark SQL queries in parallel.

Cloudera Community

Support Questions

How to run spark sql in parallel?

JSON to SQL using Spark

Using Sqoop to fetch many tables in parallel

Spark SQL - Update Command

reading data from oracle in parallel

Spark HBase Connector + Composite Rowkey & SQL Ana...

Running code in parallel with only one Zeppelin in...

Spark sql vs hive on spark

SPARK SQL query to modify values

How to install and run Spark 2.0 on HDP 2.5 Sandbo...

Yarn applications hang foreever if run in parallel