Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Distinguish between Hive and Pig? Why have both?

New Contributor

Hi,

 

My experience - a month old in the Hadoop world. Fiddled a bit in Hive, Pig and Hadoop utilizing Cloudera's Hadoop VM. Have perused Google's paper on Map-Reduce and GFS (PDF connect).

I comprehend that-

 

 

  • Pig's dialect Pig Latin is a move from(suits the way software engineers think) SQL like decisive style of programming and Hive's question dialect intently takes after SQL.
  • Pig sits over Hadoop and on a basic level can likewise sit over Dryad. I may not be right yet Hive is firmly coupled to Hadoop.
  • Both Pig Latin and Hive orders gathers to Map and Reduce occupations.

 

Thanks and Regards,

Sireesha.

3 REPLIES 3

Community Manager

Hey Siri,

 

You may take an interest in this community article:

 

The Differences between Pig, Hive, and HBase

 

I hope it helps.


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Champion

To put it in simple  

 

Hive - does not verify the data when it is loaded but rather when a query is issused other wise callled as Schema on read . So the intial load of data is fast when compared to Schema on write i,e traditional  database systems. 

 

Pig - This is more of data flow programming lanaguage , like you have the freedom to  tell how you want yourdata to be transformed , based on the input relation . Also you define your schema on runtime. 

 

Hope this is suffice . 

Contributor

Siri,

 

Thank you for getting involved in big data.

 

  • Pig's dialect Pig Latin is a move from (suits the way software engineers think) SQL like decisive style of programming and Hive's question dialect intently takes after SQL.

These are both just tools to shield people from having to be Java programmers writing raw MapReduce applications.  While Pig may be more expressive in some ways than SQL, far more people know SQL in the IT industry than Pig Latin, so it has certainly had much more uptake.  More importantly, Hive fits very nicely into many existing workloads that used to be run on traditional databases.

 

  • Pig sits over Hadoop and on a basic level can likewise sit over Dryad. I may not be right yet Hive is firmly coupled to Hadoop.

 

The storage mechanism underneath the processing is not what defines these products.  They are simply translation tools that take something humans can understand and turns it into a MapReduce application.  The storage mechanisms are changing all the time.  For example, Hive can generate MapReduce applications that run on a local file system, HDFS, and they can also run on Amazon S3 buckets.

 

  • Both Pig Latin and Hive orders gathers to Map and Reduce occupations.

Exactly.  That's the ticket.

 

 

 

Moving forward, there is Spark which is changing how we do big data processing.  CDH Hive can currently be configured to generate Spark applications instead of MapReduce.  There is some work going on now in the Pig ecosystem to do the same.

 

...And just to be confusing, Spark programmers can submit SQL statements directly to the Spark framework, in their code, that drives their application without having any contact with Hive.

 

If you're interested in the SQL arena, also be sure to check out Apache Impala.  It is a high-speed compute engine that accepts SQL statement.  It serves the same role as Hive (SQL on Big Data), but there are obviously some trade-offs/overheard for achieving a high speed so it is not currently a drop-in replacement.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.