<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question IllegalArgumentException: requirement failed: maxBins should be greater than max categories in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40219#M26328</link>
    <description>&lt;P&gt;CDH 5.2.0, Centos 6.4&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The skeleton of &lt;FONT face="courier new,courier"&gt;decision_tree.scala&lt;/FONT&gt; is like&lt;/P&gt;&lt;PRE&gt;...
val raw_data = sqlContext.parquetFile("/path/to/raw/data/")

raw_data.registerTempTable("raw_data")

val raw_rdd = sqlContext.sql("select ... from raw_data where rec_type=3")

val filtered_rdd = raw_rdd.map{case Row(label: Integer, ...) =&amp;gt; 
  LabeledPoint(label, Vector.dense(...)) }

val splits = filtered_rdd.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int](0 -&amp;gt; 20, 1 -&amp;gt; 30) 
val impurity = "gini" 
val maxDepth = 12
val maxBins = 32

val model = DecisionTree.trainClassifier(trainingData, numClasses, 
  categoricalFeaturesInfo, impurity, maxDepth, maxBins)
...&lt;/PRE&gt;&lt;P&gt;When I invoke spark-shell with command&lt;/P&gt;&lt;PRE&gt;$ spark-shell --executor-memory 2g --driver-memory 2g -deprecation -i decision_tree.scala&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The job fails with following error, even &lt;FONT face="courier new,courier"&gt;maxBins&lt;/FONT&gt; was set to 32&lt;/P&gt;&lt;PRE&gt;java.lang.IllegalArgumentException: requirement failed: maxBins (= 4) should be greater than max categories in categorical features (&amp;gt;= 20)
	at scala.Predef$.require(Predef.scala:233)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:91)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:90)
	at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:90)
	at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:66)
	at org.apache.spark.mllib.tree.DecisionTree$.train(DecisionTree.scala:339)
	at org.apache.spark.mllib.tree.DecisionTree$.trainClassifier(DecisionTree.scala:368)
	at $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcVI$sp(&amp;lt;console&amp;gt;:124)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at $iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:22)
	at $iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:160)
	at $iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:162)
	at $iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:164)
	at &amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:166)
	at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:170)
	at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)
	at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:7)
	at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)
	at $print(&amp;lt;console&amp;gt;)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:846)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1119)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:672)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:703)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:667)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:819)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:864)
... (long chain of reallyInterpret$1 and interpretStartingWith)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:776)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:619)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:627)
	at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:632)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:642)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:639)
	at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)
	at scala.reflect.io.File.applyReader(File.scala:82)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop.savingReplayStack(SparkILoop.scala:153)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:158)
	at org.apache.spark.repl.SparkILoop.interpretAllFrom(SparkILoop.scala:637)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:702)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:701)
	at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:695)
	at org.apache.spark.repl.SparkILoop.loadCommand(SparkILoop.scala:701)
	at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311)
	at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311)
	at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:872)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:870)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:870)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:957)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:907)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1002)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)&lt;/PRE&gt;&lt;P&gt;If the criteria (&lt;FONT face="courier new,courier"&gt;rec_type=3&lt;/FONT&gt;) was removed from &lt;FONT face="courier new,courier"&gt;raw_rdd&lt;/FONT&gt;, the job runs to completion.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any idea?&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:15:51 GMT</pubDate>
    <dc:creator>athtsang</dc:creator>
    <dc:date>2022-09-16T10:15:51Z</dc:date>
    <item>
      <title>IllegalArgumentException: requirement failed: maxBins should be greater than max categories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40219#M26328</link>
      <description>&lt;P&gt;CDH 5.2.0, Centos 6.4&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The skeleton of &lt;FONT face="courier new,courier"&gt;decision_tree.scala&lt;/FONT&gt; is like&lt;/P&gt;&lt;PRE&gt;...
val raw_data = sqlContext.parquetFile("/path/to/raw/data/")

raw_data.registerTempTable("raw_data")

val raw_rdd = sqlContext.sql("select ... from raw_data where rec_type=3")

val filtered_rdd = raw_rdd.map{case Row(label: Integer, ...) =&amp;gt; 
  LabeledPoint(label, Vector.dense(...)) }

val splits = filtered_rdd.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))

val numClasses = 2
val categoricalFeaturesInfo = Map[Int, Int](0 -&amp;gt; 20, 1 -&amp;gt; 30) 
val impurity = "gini" 
val maxDepth = 12
val maxBins = 32

val model = DecisionTree.trainClassifier(trainingData, numClasses, 
  categoricalFeaturesInfo, impurity, maxDepth, maxBins)
...&lt;/PRE&gt;&lt;P&gt;When I invoke spark-shell with command&lt;/P&gt;&lt;PRE&gt;$ spark-shell --executor-memory 2g --driver-memory 2g -deprecation -i decision_tree.scala&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The job fails with following error, even &lt;FONT face="courier new,courier"&gt;maxBins&lt;/FONT&gt; was set to 32&lt;/P&gt;&lt;PRE&gt;java.lang.IllegalArgumentException: requirement failed: maxBins (= 4) should be greater than max categories in categorical features (&amp;gt;= 20)
	at scala.Predef$.require(Predef.scala:233)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:91)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$$anonfun$buildMetadata$2.apply(DecisionTreeMetadata.scala:90)
	at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
	at org.apache.spark.mllib.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:90)
	at org.apache.spark.mllib.tree.DecisionTree.train(DecisionTree.scala:66)
	at org.apache.spark.mllib.tree.DecisionTree$.train(DecisionTree.scala:339)
	at org.apache.spark.mllib.tree.DecisionTree$.trainClassifier(DecisionTree.scala:368)
	at $iwC$$iwC$$iwC$$iwC$$anonfun$1.apply$mcVI$sp(&amp;lt;console&amp;gt;:124)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at $iwC$$iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:22)
	at $iwC$$iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:160)
	at $iwC$$iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:162)
	at $iwC.&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:164)
	at &amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:166)
	at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:170)
	at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)
	at .&amp;lt;init&amp;gt;(&amp;lt;console&amp;gt;:7)
	at .&amp;lt;clinit&amp;gt;(&amp;lt;console&amp;gt;)
	at $print(&amp;lt;console&amp;gt;)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:846)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1119)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:672)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:703)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:667)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:819)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:864)
... (long chain of reallyInterpret$1 and interpretStartingWith)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:776)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:619)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:627)
	at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:632)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:642)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:639)
	at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)
	at scala.reflect.io.File.applyReader(File.scala:82)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:639)
	at org.apache.spark.repl.SparkILoop.savingReplayStack(SparkILoop.scala:153)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop$$anonfun$interpretAllFrom$1.apply(SparkILoop.scala:638)
	at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:158)
	at org.apache.spark.repl.SparkILoop.interpretAllFrom(SparkILoop.scala:637)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:702)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadCommand$1.apply(SparkILoop.scala:701)
	at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:695)
	at org.apache.spark.repl.SparkILoop.loadCommand(SparkILoop.scala:701)
	at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311)
	at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:311)
	at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:872)
	at org.apache.spark.repl.SparkILoop$$anonfun$loadFiles$1.apply(SparkILoop.scala:870)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:870)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:957)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
	at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:907)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1002)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)&lt;/PRE&gt;&lt;P&gt;If the criteria (&lt;FONT face="courier new,courier"&gt;rec_type=3&lt;/FONT&gt;) was removed from &lt;FONT face="courier new,courier"&gt;raw_rdd&lt;/FONT&gt;, the job runs to completion.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any idea?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:15:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40219#M26328</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2022-09-16T10:15:51Z</dc:date>
    </item>
    <item>
      <title>Re: IllegalArgumentException: requirement failed: maxBins should be greater than max categories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40220#M26329</link>
      <description>&lt;P&gt;The problem is that you have very few input data points -- 4, I'm guessing. maxBins &amp;gt; size of input doesn't make sense, so it's capped at the size of the input. But then, it also can't be less than the number of values for any categorical feature, since that implies it doesn't have permission to try all possible values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's not obvious from the error (which is better in later versions than Spark 1.1 that you're using) but that's almost certainly the issue.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 10:27:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40220#M26329</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2016-04-27T10:27:24Z</dc:date>
    </item>
    <item>
      <title>Re: IllegalArgumentException: requirement failed: maxBins should be greater than max categories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40249#M26330</link>
      <description>That's it. Thanks.</description>
      <pubDate>Thu, 28 Apr 2016 01:22:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/IllegalArgumentException-requirement-failed-maxBins-should/m-p/40249#M26330</guid>
      <dc:creator>athtsang</dc:creator>
      <dc:date>2016-04-28T01:22:31Z</dc:date>
    </item>
  </channel>
</rss>

