Support Questions

Find answers, ask questions, and share your expertise

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Cloudera Community
- :
- Support
- :
- Support Questions
- :
- Re: Error in building the Generalized Linear Model...

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Super Collaborator

Created 11-08-2016 10:46 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

HI Experts,

I am using Spark 2.0.0 and I have an airline dataset. I created a SparkR dataframe and able to run some of the functions of SparkR Dataframe API. But, I am running through some exceptions while building the linear model using Gaussian family. Here is my command:

*model <- glm(train_data, ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")*

**ERROR:**

*Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : *

* org.apache.spark.sql.AnalysisException: Cannot resolve column name "formula" among (YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, CARRIER, FL_NUM, ORIGIN, DEST, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, CANCELLED, CANCELLATION_CODE, AIR_TIME, DISTANCE, WEEKEND, DEP_HOUR, DELAY_LABELED);*

For some reason, it tries to fetch formula column, so I replaced above command with:

*model <- glm(train_data, formula = ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")*

This time, I got this error:

*ERROR Executor: Exception in task 0.0 in stage 45.0 (TID 95)*

*scala.MatchError: [null,1.0,[1.0,11.0,5.0,1.0,2475.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)*

Has anyone seen such kind of behaviour? Thanks in advance

1 ACCEPTED SOLUTION

Accepted Solutions

Super Collaborator

Created 11-12-2016 01:14 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

3 REPLIES 3

Re: Error in building the Generalized Linear Model in SparkR

Super Collaborator

Created 11-11-2016 08:37 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

it would be the great help if someone replies to this thread, kind of stuck here. Thanks

Super Collaborator

Created 11-12-2016 01:14 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Error in building the Generalized Linear Model in SparkR

Explorer

Created 04-25-2018 04:47 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

You could also drop null values from your initial columns with:

train_df <- dropna(train_df,cols = 'ARR_DELAY')

Announcements

Product Announcements

What's New @ Cloudera

What's New @ Cloudera

What's New @ Cloudera

What's New @ Cloudera