Reply
New Contributor
Posts: 7
Registered: ‎08-26-2015

Intel MKL + Spark on CDH 5.9

Hello all

 

I'm trying to use Intel MKL to speedup computation on Spark.  I'm using CDH 5.9 on 3 nodes cluster. I have setup Intel MKL on all nodes, added /opt/intel/mkl/lib/intel64/ to /etc/ld.so.conf & performed ldconfig, but when I use the spark-shell, I still get standard errors:

 

17/01/02 19:59:09 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/01/02 19:59:09 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS

 

I've used update-alternatives to point to Intel's implementation of BLAS & LAPACK:

 

0 lrwxrwxrwx 1 root root 28 Mar 27 2015 /usr/lib/libblas.so -> /etc/alternatives/libblas.so*
0 lrwxrwxrwx 1 root root 30 Mar 27 2015 /usr/lib/libblas.so.3 -> /etc/alternatives/libblas.so.3*
0 lrwxrwxrwx 1 root root 30 Jan 2 08:11 /usr/lib/liblapack.so -> /etc/alternatives/liblapack.so*
0 lrwxrwxrwx 1 root root 32 Jul 8 2015 /usr/lib/liblapack.so.3 -> /etc/alternatives/liblapack.so.3*

 

0 lrwxrwxrwx 1 root root 43 Jan 2 08:11 /etc/alternatives/libblas.so -> /opt/intel/mkl/lib/intel64_lin/libmkl_rt.so*
0 lrwxrwxrwx 1 root root 43 Jan 2 08:11 /etc/alternatives/libblas.so.3 -> /opt/intel/mkl/lib/intel64_lin/libmkl_rt.so*
0 lrwxrwxrwx 1 root root 43 Jan 2 08:11 /etc/alternatives/liblapack.so -> /opt/intel/mkl/lib/intel64_lin/libmkl_rt.so*
0 lrwxrwxrwx 1 root root 43 Jan 2 08:11 /etc/alternatives/liblapack.so.3 -> /opt/intel/mkl/lib/intel64_lin/libmkl_rt.so

 

Does anybody has an experience of setting Intel MKL for Spark on CDH? Can you share your experience?

 

Thank you

 

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Intel MKL + Spark on CDH 5.9

Yes, for now, you'd have to enable MKL manually. However there's significantly more to it than the steps you mention below. See https://github.com/fommil/netlib-java for a good writeup.

 

In practice, most of the other steps would be taken care of if you install the GPLEXTRAS parcel. This enables native acceleration using OpenBLAS (among other things) and sets up library paths, etc. If you set that up, then the steps below might be all that's left to get MKL working.

 

You may find just the OpenBLAS acceleration is enough and it isn't worth manually modifying the installation, which isn't supported.

 

 

Later, we just might have a better solution for getting MKL installed too.

New Contributor
Posts: 7
Registered: ‎08-26-2015

Re: Intel MKL + Spark on CDH 5.9

Thank you - installation of GPLEXTRAS alone didn't help, so will need to spend more time on debugging this.
Contributor
Posts: 69
Registered: ‎01-24-2017

Re: Intel MKL + Spark on CDH 5.9

I recently had to recompile Spark to enable MKL acceleration. For some reason, it took many hours. However, that version of Spark was installed outside of Hadoop on a general purpose Slurm cluster.

 

I wonder if it is necessary to recompile the version of Spark that comes with Cloudera to enable MKL acceleration.

Also: is it easy to put under Cloudera new version of software, like Spark, that did not come from Cloudera or other Hadoop components?

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Intel MKL + Spark on CDH 5.9

You recompiled to build with -Pnetlib-lgpl ? You don't need to do that with CDH. If you install the GPLEXTRAS parcel you'll get this extra support added to Spark jobs. This enables acceleration via OpenBLAS, but if you have MKL on your machines and library path it would automatically use it.

 

You would never want to modify the CDH installation, no. Certainly not if you want support.

You can try whatever you like on your cluster if you're the one maintaining it, but I wouldn't recommend it.

Here you don't need to anyway.


(PS we may be able to MKL available directly on CDH in the future.)

Contributor
Posts: 69
Registered: ‎01-24-2017

Re: Intel MKL + Spark on CDH 5.9

> but if you have MKL on your machines and library path it would
automatically use it.

Where does Hadoop pick up environment from? If I run it as root, is it
enough to set up LD_LIBRARY_PATH, MKL_ROOT or source the whole
mklvars.sh in /root/.bashrc?
Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Intel MKL + Spark on CDH 5.9

Right now if you install GPLEXTRAS it'll take care of that. I believe it causes java.library.path to be set appropriately for these jobs.

 

I think that MKL is a drop-in for openblas libraries, at least according to https://github.com/fommil/netlib-java so if you insert MKL that way, the rest of the config should already be correct because it will refer to openblas .so files.