Reply
New Contributor
Posts: 1
Registered: ‎05-25-2018

Enabling Native Acceleration for MLlib

Hello,

 

 I've been tempted to just add a new message in this other thread (https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Intel-MKL-Spark-on-CDH-5-9/m-p/489...) because I think the issue AlexOtt was having there is the same issue I'm having now. But in the end, I've chosen to create a new thread.

 

I'm trying to use MLlib LDA and LSA models in Spark 1.6.0  (I'm using a cluster CDH 5.11.x running  on RHEL 7.3 nodes) Both models seem to be able to take advantage of native Math acceleration with BLAS native libraries, so I've been trying to use them in Spark.

 

 Following the instructions given by Cloudera  in https://www.cloudera.com/documentation/enterprise/5-11-x/topics/spark_mllib.html, supposedly you just have to install the package libgfortran in the nodes and then deploy and activate the parcel GPL Extras. I did that successfully . But I can't make it work. When I try to use the BLAS native libraries Spark can't load them:

 

scala> import com.github.fommil.netlib.BLAS;
import com.github.fommil.netlib.BLAS

scala> System.out.println(BLAS.getInstance().getClass().getName());
18/05/25 12:18:09 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
18/05/25 12:18:09 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
com.github.fommil.netlib.F2jBLAS

 I'm sure Spark-shell can find this package, since the class path in execution seems okay:

 

scala> import java.lang.ClassLoader
import java.lang.ClassLoader

scala> val cl = ClassLoader.getSystemClassLoader
cl: ClassLoader = sun.misc.Launcher$AppClassLoader@3479404a

scala> cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
file:/etc/spark/conf.cloudera.spark_on_yarn/
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/spark-assembly-1.6.0-cdh5.11.1-hadoop2.6.0-cdh5.11.1.jar
file:/etc/spark/conf.cloudera.spark_on_yarn/yarn-conf/
file:/etc/hive/conf.cloudera.hive/
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/ST4-4.0.4.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/accumulo-core-1.6.0.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/accumulo-fate-1.6.0.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/accumulo-start-1.6.0.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/accumulo-trace-1.6.0.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/activation-1.1.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/ant-1.9.1.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/ant-launcher-1.9.1.jar
file:/usr/PRODUCTO/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/jars/antlr-2.7.7.jar

<SNIP>

file:/usr/PRODUCTO/cloudera/parcels/GPLEXTRAS-5.11.1-1.cdh5.11.1.p0.4/lib/spark-netlib/lib/netlib-native_ref-linux-x86_64-1.1-natives.jar
file:/usr/PRODUCTO/cloudera/parcels/GPLEXTRAS-5.11.1-1.cdh5.11.1.p0.4/lib/spark-netlib/lib/netlib-native_system-linux-x86_64-1.1-natives.jar
file:/usr/PRODUCTO/cloudera/parcels/GPLEXTRAS-5.11.1-1.cdh5.11.1.p0.4/lib/spark-netlib/lib/native_system-java-1.1.jar
file:/usr/PRODUCTO/cloudera/parcels/GPLEXTRAS-5.11.1-1.cdh5.11.1.p0.4/lib/spark-netlib/lib/native_ref-java-1.1.jar

...

 

So it doesn't seem there is an issue with Spark not finding the .jar. But still, they just won't load when they're invoked.

 

 So It seems that just following the instructions given in the documentation is not enough to make native acceleration work in Spark. Can anybody give me a hand with these issues?

 

Thank you very much.

 

Announcements