Created on 05-17-2017 04:32 PM - edited 09-16-2022 01:40 AM
First you need to have Rapidminer downloaded and installed on your machine.
https://my.rapidminer.com/nexus/account/index.html#downloads
Once installed open Rapidminer and look at the list of operators.
There is a link at the bottom left to "Get More Operators"
Click the link then search for "Radoop"
Select both packages and click install.
After Rapidminer restarts you will see in the extensions folder the new operators we downloaded.
Now we need to configure the connection.
In the toolbar select "connections" then "Manage Radoop Connections"
Select "+ New Connection".
If you have your Hadoop config files available you can use those to set the properties otherwise select "manual"
Select the hadoop version you have. In my case "Hortonworks 2.x"
and supply the master url... If you have multiple masters select the check box and provide the details.
Click "OK"
Now click ">> Quick Test"
If successful you are all set to read from Hive.
Drag an "Radoop Nest" operator onto the canvas.
Select the operator on the canvas and on the right hand side of the IDE select the connection we created earlier.
Now double click the Radoop Nest operator to enter the nested canvas.
Drag a "Retrieve from Hive" operator into the canvas, located in Radoop-->Data Access
Click the operator and select a table that you wish to select.
Connect the out port of the operator to the out port on the edge of the canvas by dragging from one to the other.
Now click the Play button and wait for it to complete.
Click the out port and select show sample data.
Hope this was helpful!
More to come on Rapidminer + Hortonworks ...