Support Questions

Kevin_DElia · ‎08-24-2014

Hello,

I am just starting my education in the world of big data, Hadoop, and MapReduce. While I understand the concept of inverted indexes, I'm not sure I understand the purpose. What problem is being solved by creating an inverted index? What valuable information does such an approach provide?

Thanks,

Kevin

Thanks in advance to all who reply.

Kevin

Sean · ‎10-06-2014

Kevin,

The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.

e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln

This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:

e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin

Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).

I know it's been a while since you asked your question, but I hope this helps!

View solution in original post

Sean · ‎10-06-2014

Kevin,

The Hadoop ecosystem is a lot more complex than just a simple key-value store, but a key-value store is sufficient to answer your question. Let's say you have data of the form "Key => Value1", in one location, and "Key => Value2" in another location. If you know one value, it's not trivial to find all related values. Unless.... you have an inverted index that allows you to look up the key for any given value, and then use that key to look up other values. For instance, say I have a database that lists the mailing address for each person.

e.g. Kevin => 1 Apple St, Sean => 2 Zebra Ln

This is great if you just want to see where specific people live, but what if your question starts with having an address and needing to know all the people who live there? Instead of the key being the name and the value being the address, you create a different index that inverts this:

e.g 2 Zebra Ln => Sean, 1 Apple St => Kevin

Now it's easy to see everyone who shares an address because they would also share a key (which is actually not doable in some key-value stores - in which case you would modify the value field to encode a sequence of values).

I know it's been a while since you asked your question, but I hope this helps!

Kevin_DElia · ‎10-13-2014

Hi, Sean -

Yes, that does help. I took the Developer course and we discussed this. I was surprised to learn (and it's obvious to me now) that book indices are actually inverted indices! Learning to think in Hadoop and M/R will be the first challenge to overcome as I begin my efforts in working in the Big Data arena.

Thanks for your help.

Kevin

Thanks in advance to all who reply.

Kevin

Cloudera Community

Support Questions

Purpose of inverted indexes