Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what is difference between Rhadoop and pig?

Highlighted

what is difference between Rhadoop and pig?

Explorer
 
9 REPLIES 9
Highlighted

Re: what is difference between Rhadoop and pig?

Master Collaborator

R and Pig Latin are both languages. RHadoop is a package for the R language that allows you to interact with Hadoop clusters from within that language. Pig programs are executed on a Hadoop cluster. R is going to be a stronger choice for more sophisticated statistical analysis, but Pig is better suited to when you need to perform transformations and simpler analysis on the data, and is better integrated with the rest of the Hadoop ecosystem.

Highlighted

Re: what is difference between Rhadoop and pig?

Explorer

In My project,we are using ssis for transformation and ssas for analytic.

 

so which tool is should be used for replacement of ssis and ssas? could you pelase hel pme.

Highlighted

Re: what is difference between Rhadoop and pig?

Explorer

In My project,we are using ssis for transformation and ssas for analytic.

 

so which tool is should be used for replacement of ssis and ssas? could you pelase hel pme.

Highlighted

Re: what is difference between Rhadoop and pig?

Explorer

but you are saying is Rhadoop is strong for data analytics.pig is better suited for transformation.

 

which one is better for data manipication  and processing,data analytics?

Highlighted

Re: what is difference between Rhadoop and pig?

Rising Star

Hi Indira,

Sean provided a great answer but I'll add just a little. You mentioned SSIS and SSAS - I think SSIS is "SQL Server Integration Services" and SSAS is "SQL Server Analysis Services" (both from Microsoft).

 

Focusing on SSAS, which is an OLAP tool that comes with BI support: You can use Pig (or Impala or Hive - both support SQL) to explore and transform your Big Data without having to write any code in Java, for example. As Sean said, for heavier statistics work, use RHadoop (which runs R in a distributed fashion in your Hadoop cluster). You can certainly use RHadoop with other Hadoop ecosystem tools to compose complete workflows.

 

One advantage of a Hadoop-based solution, rather than a SQL-based one like SSAS, is that with Hadoop you can run your code in a parallel fashion to process your Big Data faster. Another advantage is that you can retain all of your data; with data warehouses, it is common to aggregate or prune data in order to manage storage - this leads to flattening or, worse yet, total loss of the most interesting data points you have collected.

 

I hope this is helpful.

 

Highlighted

Re: what is difference between Rhadoop and pig?

Explorer

thank you for your suggestuon.

 

I need Group by in Rhadoop.

 

pls help me

Highlighted

Re: what is difference between Rhadoop and pig?

Rising Star

It looks like the plyrmr RHadoop package will support group functions:

https://github.com/RevolutionAnalytics/RHadoop/wiki/user%3Eplyrmr%3EHome

 

For tutorials and examples, google 'rhadoop group'.

Highlighted

Re: what is difference between Rhadoop and pig?

Explorer

thank you for giving reply.

 

i have one small doubt in Rhadoop.

here i am loading data from file using data.table option.does can we i have group by function with data. table method?

 

but in last thread you said,using plymr method do group by.

 

i read so many articles,using data.table is more fast compare to othermethods.

 

pls suggest me .which one is good.

 

please check below script ,i struck at group by with max value.

give me exact syantax with group by then max value.

 

 

 

-----Rhadoop script---

batting <- read.table(file="/home/cloudera/Desktop/test/Batting.csv",header=FALSE,sep=",",fill=TRUE,quote=NULL)

 

run <- batting[,c(1,2,10)]

max_runs <- run[,ax(v2)] --strucked

 

write.table(max_runs,"8.txt",quote=F, row.names=F, col.names=T,sep=",")

Highlighted

Re: what is difference between Rhadoop and pig?

Rising Star

Hello Indira, I found two links that may help you (see below). The second link is a tutorial that shows a grouping example.


(A high level article with some good pointers to more details.)
 
Don't have an account?
Coming from Hortonworks? Activate your account here