Community Articles

Find and share helpful community-sourced technical articles.
avatar
Guru

If you haven't already see the 1st tutorial I made which guides you through the setup of rapidminer to read from hive.

We will pick up where it leaves off.

Read-from-hive-using-rapidminer

add a "set role" operator next to the your "retrieve from hive operator" that is located in the "Radoop Nest".

This allows you to select the column you wish to use in the model.

In this case I set the name field to the category column in my dataset.

You can obtain this dataset here: data

This is just for illustrative purposes so if you have data that has labels already feel free to use in place of this.

15528-screen-shot-2017-05-17-at-114707-am.png

Now add a "split validation" operator and connect the ports.

Then double click the validation operator.

15529-screen-shot-2017-05-17-at-115104-am.png

Add a "decision tree" operator on the left pane and add an "apply model" and "performance" operator and connect them all.

15530-screen-shot-2017-05-17-at-115305-am.png

For performance select accuracy or whatever you wish to check.

If you are using the sample data provided in this tutorial you will see some errors.

Click on the error icons on each operator and select quick fix and apply.

Your panes should look like this:

15531-screen-shot-2017-05-17-at-115538-am.png

You can modify this to run on Spark if you have spark on your cluster by using the "Spark Decision Tree" Operator.

15532-screen-shot-2017-05-17-at-115722-am.png

That's how you can set up and train a model in Rapidminer.

Here we just used Decision tree but there are several algorithms to choose from.

1,834 Views