<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Data Analysis using Mahout in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193498#M76460</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/51330/gkupcovaite.html"&gt;Ginta&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;As you suggested i M using spark now. Must i use MLIB ? &lt;/P&gt;&lt;P&gt;I have a csv file in spark now. val mk =
sc.textFile("hdfs:path filenmae.csv");&lt;/P&gt;&lt;P&gt;I have 4 string values and 3 double values.&lt;/P&gt;&lt;P&gt;I need to calculate the outlier now and need to apply any prediction modelling with results in a visualized way.&lt;/P&gt;&lt;P&gt;What can i use now? Any suggestions?  Thanks.&lt;/P&gt;</description>
    <pubDate>Wed, 28 Mar 2018 09:23:36 GMT</pubDate>
    <dc:creator>Gayathridevi</dc:creator>
    <dc:date>2018-03-28T09:23:36Z</dc:date>
    <item>
      <title>Data Analysis using Mahout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193496#M76458</link>
      <description />
      <pubDate>Fri, 16 Sep 2022 13:02:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193496#M76458</guid>
      <dc:creator>Gayathridevi</dc:creator>
      <dc:date>2022-09-16T13:02:18Z</dc:date>
    </item>
    <item>
      <title>Re: Data Analysis using Mahout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193497#M76459</link>
      <description>&lt;P&gt;If you are beginner and want to start with ML, I'd suggest ditching Mahout and better learn Spark. Mahout is older project, which uses MapReduce. Spark on the other hand is in memory processing, and way more developed. &lt;/P&gt;&lt;P&gt;There's literally no one that uses Mahout... everyone is focusing on Spark:&lt;/P&gt;&lt;P&gt;&lt;A href="https://hortonworks.com/apache/spark/" target="_blank"&gt;https://hortonworks.com/apache/spark/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Check out some nice Spark tutorials we have: &lt;/P&gt;&lt;P&gt;&lt;A href="https://hortonworks.com/tutorial/hands-on-tour-of-apache-spark-in-5-minutes/" target="_blank"&gt;https://hortonworks.com/tutorial/hands-on-tour-of-apache-spark-in-5-minutes/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;A href="https://hortonworks.com/hadoop-tutorial/interacting-with-data-on-hdp-using-scala-and-apache-spark/" target="_blank"&gt;https://hortonworks.com/hadoop-tutorial/interacting-with-data-on-hdp-using-scala-and-apache-spark/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 20:23:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193497#M76459</guid>
      <dc:creator>gkupcovaite</dc:creator>
      <dc:date>2018-03-27T20:23:32Z</dc:date>
    </item>
    <item>
      <title>Re: Data Analysis using Mahout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193498#M76460</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/51330/gkupcovaite.html"&gt;Ginta&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;As you suggested i M using spark now. Must i use MLIB ? &lt;/P&gt;&lt;P&gt;I have a csv file in spark now. val mk =
sc.textFile("hdfs:path filenmae.csv");&lt;/P&gt;&lt;P&gt;I have 4 string values and 3 double values.&lt;/P&gt;&lt;P&gt;I need to calculate the outlier now and need to apply any prediction modelling with results in a visualized way.&lt;/P&gt;&lt;P&gt;What can i use now? Any suggestions?  Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 09:23:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193498#M76460</guid>
      <dc:creator>Gayathridevi</dc:creator>
      <dc:date>2018-03-28T09:23:36Z</dc:date>
    </item>
    <item>
      <title>Re: Data Analysis using Mahout</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193499#M76461</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/39249/gayathrimtechcse.html" nodeid="39249"&gt;@Gayathri Devi&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;This prediction depends on the date you have. You may have labelled or unlabelled data based on which you have different algorithms. &lt;/P&gt;&lt;P&gt;Assuming your data is labelled, then you have to find if you are trying to solve a regression problem or a classification problem. Based on that you can choose the algorithms.&lt;/P&gt;&lt;P&gt;Since you have written that you want to find outliers , I'm assuming that it is a regression problem. Then you can use algorithms like Linear Regression, Support Vector Regression, Decision tree regression, Random forest regression etc. &lt;/P&gt;&lt;P&gt;If your data is unlabelled, you have to use a unsupervised learning method. You will have algorithms like K-Means clustering, Hierarchical clustering etc.&lt;/P&gt;&lt;P&gt;The main part of any solving machine learning problem is learning what your data is and choosing the right algorithm for your problem. So you may need to spend more time in analysing data and choosing the right algorithm.&lt;/P&gt;&lt;P&gt;Here are few links for the concepts mentioned above. You can find these algorithms in spark.&lt;/P&gt;&lt;P&gt;&lt;A href="https://spark.apache.org/docs/latest/ml-guide.html" target="_blank"&gt;https://spark.apache.org/docs/latest/ml-guide.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/" target="_blank"&gt;https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning" target="_blank"&gt;https://www.quora.com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/" target="_blank"&gt;https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-algorithms/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data" target="_blank"&gt;https://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Happy machine learning &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;-Aditya&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 13:06:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Data-Analysis-using-Mahout/m-p/193499#M76461</guid>
      <dc:creator>asirna</dc:creator>
      <dc:date>2018-03-28T13:06:09Z</dc:date>
    </item>
  </channel>
</rss>

