Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what is a structure data and unstructured data in more precise way

avatar

sorry for theSilly question but I am new to HIve and BIG data world :can any one explain with neat example what is considered as structured and what is considered as unstructured if we compare to the RDBMS

1 ACCEPTED SOLUTION

avatar

I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?

View solution in original post

4 REPLIES 4

avatar
Rising Star

Hi @Himanshu Rawat,

Welcome to HCC!

Whether we class data as structured or unstructured is related to its degree of organization. For example, consider the content and metadata of email.

The metadata associated with the emails I have sent would be structured. It needs to be very organized so the email servers know the sender, recipient(s), CC, BCC, time sent/received, etc. For example, the time received can easily be compared to the time on other emails. I could easily sort my emails based on time and find the most recent or something from a particular date.

The content or body on the other hand would be considered unstructured. I could put anything in there. How would I organize emails if I only considered the content? Number of words? Spaces? Positivity of the post? What would it mean?

Hope that helps

avatar

I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?

avatar
Rising Star

There were (and still are) a number of methods, including:

  • Throw data away
    • Down Sample - Decide what you think is important up front and throw the rest away
    • Age Off - Periodically delete old data
  • Warehouse - write old data to tapes and delete off the disks
  • Buy specialised hardware - Very large, expensive dedicated database machines which don't scale
  • Don't use a traditional database - keep everything in files and distribute manually to a cluster
  • Traditional database horizontal scaling - never done it but heard it's difficult

Apparently, Facebook still uses MySQL "with a complex sharding and caching strategy" - Gigacom

avatar

Thanks Carroll