About vjain

vjain · ‎12-16-2015

@Nikolaos Stanogias This looks like a bug in 2.3.2 Make sure Atlas is started and out of maintenance mode. That will work.

vjain · ‎12-14-2015

@Dhruv Kumar Thanks it worked.

vjain · ‎12-11-2015

@Joe Widen I tried it earlier and gave me the same error.

vjain · ‎12-11-2015

@Dhruv Kumar Spark1.4.1 and HDP2.3.2

vjain · ‎12-11-2015

import org.apache.spark.mllib.regression.LinearRegressionWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer} import sqlContext.implicits._ val df = sqlContext.sql("select mnemonic, average, median, stddev from wellbook.curve_statistics") val indexer = new StringIndexer() .setInputCol("mnemonic") .setOutputCol("mnemonicIndex") .fit(df) val indexed = indexer.transform(df) val encoder = new OneHotEncoder().setInputCol("mnemonicIndex"). setOutputCol("mnemonicVec") val encoded = encoder.transform(indexed) val data = encoded.select("mnemonicVec", "average", "median", "stddev") val parsedData = data.map(row => LabeledPoint(row.getDouble(0), row.getAs[Vector](1))) <console>:297: error: kinds of the type arguments (Vector) do not conform to the expected kinds of the type parameters (type T). Vector's type parameters do not match type T's expected parameters: type Vector has one type parameter, but type T has none val parsedData = data.map(row => LabeledPoint(row.getDouble(0), row.getAs[Vector](1))

vjain · ‎12-11-2015

hadoop.proxyuser.hive.groups = * Worked for me. Thanks @Ali Bajwa

vjain · ‎12-09-2015

@Ali Bajwa Should it be the python directory or the pyspark directory? as in /usr/loca/../python or /usr/hdp/2..../spark/python

vjain · ‎12-09-2015

@Ofer Mendelevith I think its an issue with LabeledPoint. It's expecting Labeled but not getting it. val examples = MLUtils.loadLabeledData(sc,"hdfs:///user/zeppelin/las_demo/part-00000").cache() val splits = examples.randomSplit(Array(0.8, 0.2)) val training = splits(0).cache() val test = splits(1).cache() val numTraining = training.count() val numTest = test.count() println(s"Training: $numTraining, test: $numTest.") val updater = new SquaredL2Updater() val model = { val algorithm = new LogisticRegressionWithSGD() algorithm.optimizer.setNumIterations(200).setStepSize(1.0).setUpdater(updater).setRegParam(0.1) algorithm.run(training).clearThreshold() } val rprediction = model.predict(test.map(_.features)) val rpredictionAndLabel = rprediction.zip(testRDD.map(_.label)) val rmetrics = new BinaryClassificationMetrics(rpredictionAndLabel) ERROR is as follows: warning: there were 1 deprecation warning(s); re-run with -deprecation for details examples: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[52] at map at MLUtils.scala:214 splits: Array[org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]] = Array(PartitionwiseSampledRDD[53] at randomSplit at <console>:72, PartitionwiseSampledRDD[54] at randomSplit at <console>:72) training: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = PartitionwiseSampledRDD[53] at randomSplit at <console>:72 test: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = PartitionwiseSampledRDD[54] at randomSplit at <console>:72 numTraining: Long = 19589 numTest: Long = 4889 Training: 19589, test: 4889. updater: org.apache.spark.mllib.optimization.SquaredL2Updater = org.apache.spark.mllib.optimization.SquaredL2Updater@3b9284cd org.apache.spark.SparkException: Input validation failed. at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:210) at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:190) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:81) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:87) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:89) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:91) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:93) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:95) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:97) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:99) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:101) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:103) at $iwC$$iwC$$iwC.<init>(<console>:105) at $iwC$$iwC.<init>(<console>:107) at $iwC.<init>(<console>:109) at <init>(<console>:111) at .<init>(<console>:115) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:655) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:620) at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:613) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

vjain · ‎12-09-2015

I had 2 versions of Python installed. Zeppelin is still using the older one.

vjain · ‎12-09-2015

Iam able to import a library in pyspark shell without any problems, but when I try to import the same library in Zeppelin, I get an error ImportError: No module named xxxxx

Online	Offline
Last Visited	‎08-08-2019 05:49 AM

Member Since	‎10-02-2015 08:11 PM
Last Visited	‎08-08-2019 05:49 AM
Posts	76
Kudos received	77

Cloudera Community

Re: how to upgrade spark 1.3.1.2.3 to spark 1.41

Re: Write Dataframe to teradata

Re: SAP HANA / SAP HANA Vora Processor for Apache ...

Re: Getting 'publicIP' error when installing Cloud...

Re: Filesystem exception

Re: Critical alert for Hive metastore process

Re: Type Error when attempting Linear Regression

Re: Type Error when attempting Linear Regression

Re: Type Error when attempting Linear Regression

Type Error when attempting Linear Regression

Re: Error while running hive queries from Zeppelin...

Re: PySpark in Zeppelin: Does not have all librari...

Re: Run RDD operations on SQL Dataframe in 1.3.1

Re: PySpark in Zeppelin: Does not have all librari...

PySpark in Zeppelin: Does not have all libraries