<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Error in building the Generalized Linear Model in SparkR in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169920#M132226</link>
    <description>&lt;P&gt;it would be the great help if someone replies to this thread, kind of stuck here. Thanks &lt;/P&gt;</description>
    <pubDate>Sat, 12 Nov 2016 04:37:28 GMT</pubDate>
    <dc:creator>mrizvi</dc:creator>
    <dc:date>2016-11-12T04:37:28Z</dc:date>
    <item>
      <title>Error in building the Generalized Linear Model in SparkR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169919#M132225</link>
      <description>&lt;P&gt;HI Experts,&lt;/P&gt;&lt;P&gt;I am using Spark 2.0.0 and I have an airline dataset. I created a SparkR dataframe and able to run some of the functions of SparkR Dataframe API. But,  I am running through some exceptions while building the linear model using Gaussian family. Here is my command:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;model &amp;lt;- glm(train_data, ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ERROR:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt; org.apache.spark.sql.AnalysisException: Cannot resolve column name "formula" among (YEAR, MONTH, DAY_OF_MONTH, DAY_OF_WEEK, CARRIER, FL_NUM, ORIGIN, DEST, DEP_TIME, DEP_DELAY, ARR_TIME, ARR_DELAY, CANCELLED, CANCELLATION_CODE, AIR_TIME, DISTANCE, WEEKEND, DEP_HOUR, DELAY_LABELED);&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;For some reason, it tries to fetch formula column, so I replaced above command with:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;model &amp;lt;- glm(train_data, formula = ARR_DELAY ~ MONTH + DEP_HOUR + DEP_DELAY + WEEKEND + DISTANCE, family = "gaussian")&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This time, I got this error:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;ERROR Executor: Exception in task 0.0 in stage 45.0 (TID 95)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;scala.MatchError: [null,1.0,[1.0,11.0,5.0,1.0,2475.0]] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Has anyone seen such kind of behaviour? Thanks in advance&lt;/P&gt;</description>
      <pubDate>Wed, 09 Nov 2016 06:46:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169919#M132225</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-11-09T06:46:33Z</dc:date>
    </item>
    <item>
      <title>Re: Error in building the Generalized Linear Model in SparkR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169920#M132226</link>
      <description>&lt;P&gt;it would be the great help if someone replies to this thread, kind of stuck here. Thanks &lt;/P&gt;</description>
      <pubDate>Sat, 12 Nov 2016 04:37:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169920#M132226</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-11-12T04:37:28Z</dc:date>
    </item>
    <item>
      <title>Re: Error in building the Generalized Linear Model in SparkR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169921#M132227</link>
      <description>&lt;P&gt;Got this one working, there were some null values in output/dependent variable ARR_DELAY. Replaced those with the mean value of the column.&lt;/P&gt;</description>
      <pubDate>Sat, 12 Nov 2016 09:14:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169921#M132227</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-11-12T09:14:37Z</dc:date>
    </item>
    <item>
      <title>Re: Error in building the Generalized Linear Model in SparkR</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169922#M132228</link>
      <description>&lt;P&gt;You could also drop null values from your initial columns with:&lt;/P&gt;&lt;P&gt;train_df &amp;lt;-
dropna(train_df,cols = 'ARR_DELAY')&lt;/P&gt;</description>
      <pubDate>Wed, 25 Apr 2018 11:47:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-in-building-the-Generalized-Linear-Model-in-SparkR/m-p/169922#M132228</guid>
      <dc:creator>palwell</dc:creator>
      <dc:date>2018-04-25T11:47:29Z</dc:date>
    </item>
  </channel>
</rss>

