Created on 08-24-2014 11:25 AM - edited 09-16-2022 02:05 AM
Hello,
I am just starting my education in the world of big data, Hadoop, and MapReduce. While I understand the concept of inverted indexes, I'm not sure I understand the purpose. What problem is being solved by creating an inverted index? What valuable information does such an approach provide?
Thanks,
Kevin
Created 10-06-2014 02:49 PM
Kevin,
The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.
e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln
This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:
e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin
Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).
I know it's been a while since you asked your question, but I hope this helps!
Created 10-06-2014 02:49 PM
Kevin,
The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.
e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln
This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:
e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin
Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).
I know it's been a while since you asked your question, but I hope this helps!
Created 10-13-2014 06:41 AM
Hi, Sean -
Yes, that does help. I took the Developer course and we discussed this. I was surprised to learn (and it's obvious to me now) that book indices are actually inverted indices! Learning to think in Hadoop and M/R will be the first challenge to overcome as I begin my efforts in working in the Big Data arena.
Thanks for your help.
Kevin