Reply
Explorer
Posts: 12
Registered: ‎09-11-2015

what is difference between Rhadoop and pig?

 
Cloudera Employee
Posts: 435
Registered: ‎07-12-2013

Re: what is difference between Rhadoop and pig?

R and Pig Latin are both languages. RHadoop is a package for the R language that allows you to interact with Hadoop clusters from within that language. Pig programs are executed on a Hadoop cluster. R is going to be a stronger choice for more sophisticated statistical analysis, but Pig is better suited to when you need to perform transformations and simpler analysis on the data, and is better integrated with the rest of the Hadoop ecosystem.

Explorer
Posts: 12
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

In My project,we are using ssis for transformation and ssas for analytic.

 

so which tool is should be used for replacement of ssis and ssas? could you pelase hel pme.

Explorer
Posts: 12
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

In My project,we are using ssis for transformation and ssas for analytic.

 

so which tool is should be used for replacement of ssis and ssas? could you pelase hel pme.

Explorer
Posts: 12
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

[ Edited ]

but you are saying is Rhadoop is strong for data analytics.pig is better suited for transformation.

 

which one is better for data manipication  and processing,data analytics?

Highlighted
Cloudera Employee Sue
Cloudera Employee
Posts: 44
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

[ Edited ]

Hi Indira,

Sean provided a great answer but I'll add just a little. You mentioned SSIS and SSAS - I think SSIS is "SQL Server Integration Services" and SSAS is "SQL Server Analysis Services" (both from Microsoft).

 

Focusing on SSAS, which is an OLAP tool that comes with BI support: You can use Pig (or Impala or Hive - both support SQL) to explore and transform your Big Data without having to write any code in Java, for example. As Sean said, for heavier statistics work, use RHadoop (which runs R in a distributed fashion in your Hadoop cluster). You can certainly use RHadoop with other Hadoop ecosystem tools to compose complete workflows.

 

One advantage of a Hadoop-based solution, rather than a SQL-based one like SSAS, is that with Hadoop you can run your code in a parallel fashion to process your Big Data faster. Another advantage is that you can retain all of your data; with data warehouses, it is common to aggregate or prune data in order to manage storage - this leads to flattening or, worse yet, total loss of the most interesting data points you have collected.

 

I hope this is helpful.

 

Explorer
Posts: 12
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

thank you for your suggestuon.

 

I need Group by in Rhadoop.

 

pls help me

Cloudera Employee Sue
Cloudera Employee
Posts: 44
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

It looks like the plyrmr RHadoop package will support group functions:

https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Eplyrmr%3EHome

 

For tutorials and examples, google 'rhadoop group'.

Explorer
Posts: 12
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

[ Edited ]

thank you for giving reply.

 

i have one small doubt in Rhadoop.

here i am loading data from file using data.table option.does can we i have group by function with data. table method?

 

but in last thread you said,using plymr method do group by.

 

i read so many articles,using data.table is more fast compare to othermethods.

 

pls suggest me .which one is good.

 

please check below script ,i struck at group by with max value.

give me exact syantax with group by then max value.

 

 

 

-----Rhadoop script---

batting <- read.table(file="/home/cloudera/Desktop/test/Batting.csv",header=FALSE,sep=",",fill=TRUE,quote=NULL)

 

run <- batting[,c(1,2,10)]

max_runs <- run[,ax(v2)] --strucked

 

write.table(max_runs,"8.txt",quote=F, row.names=F, col.names=T,sep=",")

Cloudera Employee Sue
Cloudera Employee
Posts: 44
Registered: ‎09-11-2015

Re: what is difference between Rhadoop and pig?

Hello Indira, I found two links that may help you (see below). The second link is a tutorial that shows a grouping example.


(A high level article with some good pointers to more details.)