- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Purpose of inverted indexes
- Labels:
-
Apache Hadoop
-
MapReduce
Created on ‎08-24-2014 11:25 AM - edited ‎09-16-2022 02:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am just starting my education in the world of big data, Hadoop, and MapReduce. While I understand the concept of inverted indexes, I'm not sure I understand the purpose. What problem is being solved by creating an inverted index? What valuable information does such an approach provide?
Thanks,
Kevin
Kevin
Created ‎10-06-2014 02:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin,
The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.
e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln
This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:
e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin
Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).
I know it's been a while since you asked your question, but I hope this helps!
Created ‎10-06-2014 02:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin,
The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.
e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln
This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:
e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin
Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).
I know it's been a while since you asked your question, but I hope this helps!
Created ‎10-13-2014 06:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Sean -
Yes, that does help. I took the Developer course and we discussed this. I was surprised to learn (and it's obvious to me now) that book indices are actually inverted indices! Learning to think in Hadoop and M/R will be the first challenge to overcome as I begin my efforts in working in the Big Data arena.
Thanks for your help.
Kevin
Kevin
