Support Questions

mrizvi · ‎11-08-2016

HI Experts,

I am using Spark 2.0.0 and I have an airline dataset. I created a SparkR dataframe and able to run some of the functions of SparkR Dataframe API. But, I am running through some exceptions while building the linear model using Gaussian family. Here is my command:

model <- glm(train_data, ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")

ERROR:

Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :

org.apache.spark.sql.AnalysisException: Cannot resolve column name "formula" among (YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, CARRIER, FL_NUM, ORIGIN, DEST, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, CANCELLED, CANCELLATION_CODE, AIR_TIME, DISTANCE, WEEKEND, DEP_HOUR, DELAY_LABELED);

For some reason, it tries to fetch formula column, so I replaced above command with:

model <- glm(train_data, formula = ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")

This time, I got this error:

ERROR Executor: Exception in task 0.0 in stage 45.0 (TID 95)

scala.MatchError: [null,1.0,[1.0,11.0,5.0,1.0,2475.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)

Has anyone seen such kind of behaviour? Thanks in advance

mrizvi · ‎11-12-2016

Got this one working, there were some null values in output/dependent variable ARR_DELAY. Replaced those with the mean value of the column.

View solution in original post

mrizvi · ‎11-11-2016

it would be the great help if someone replies to this thread, kind of stuck here. Thanks

mrizvi · ‎11-12-2016

Got this one working, there were some null values in output/dependent variable ARR_DELAY. Replaced those with the mean value of the column.

palwell · ‎04-25-2018

You could also drop null values from your initial columns with:

train_df <- dropna(train_df,cols = 'ARR_DELAY')

Cloudera Community

Support Questions

Error in building the Generalized Linear Model in SparkR

Understanding Linear Regression

Generalized Framework to Deploy Models and Integra...

CML Model Deployment with MLFlow and APIv2

How to deploy R Models in CML

How to setup Model Registry on Cloudera Machine Le...

How to use Model Registry on Cloudera Machine Lear...

Using R packages with SparkR

Apache Zeppelin and SparkR

SparkR primer

Running SparkR in RStudio using HDP 2.4