Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Purpose of inverted indexes

Solved Go to solution

Purpose of inverted indexes

New Contributor

Hello,

 

I am just starting my education in the world of big data, Hadoop, and MapReduce.  While I understand the concept of inverted indexes, I'm not sure I understand the purpose.  What problem is being solved by creating an inverted index?  What valuable information does such an approach provide?

 

Thanks,

Kevin

Thanks in advance to all who reply.

Kevin
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Purpose of inverted indexes

Master Collaborator

Kevin,

 

The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.

 

e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln

 

This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:

 

e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin

 

Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).

 

I know it's been a while since you asked your question, but I hope this helps!

2 REPLIES 2

Re: Purpose of inverted indexes

Master Collaborator

Kevin,

 

The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.

 

e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln

 

This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:

 

e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin

 

Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).

 

I know it's been a while since you asked your question, but I hope this helps!

Highlighted

Re: Purpose of inverted indexes

New Contributor

Hi, Sean -

 

Yes, that does help.  I took the Developer course and we discussed this.  I was surprised to learn (and it's obvious to me now) that book indices are actually inverted indices!  Learning to think in Hadoop and M/R will be the first challenge to overcome as I begin my efforts in working in the Big Data arena.

 

Thanks for your help.

Kevin

Thanks in advance to all who reply.

Kevin