Support Questions

himanshu_rawat · ‎07-28-2016

sorry for theSilly question but I am new to HIve and BIG data world :can any one explain with neat example what is considered as structured and what is considered as unstructured if we compare to the RDBMS

himanshu_rawat · ‎07-28-2016

I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?

View solution in original post

scarroll · ‎07-28-2016

Hi @Himanshu Rawat,

Welcome to HCC!

Whether we class data as structured or unstructured is related to its degree of organization. For example, consider the content and metadata of email.

The metadata associated with the emails I have sent would be structured. It needs to be very organized so the email servers know the sender, recipient(s), CC, BCC, time sent/received, etc. For example, the time received can easily be compared to the time on other emails. I could easily sort my emails based on time and find the most recent or something from a particular date.

The content or body on the other hand would be considered unstructured. I could put anything in there. How would I organize emails if I only considered the content? Number of words? Spaces? Positivity of the post? What would it mean?

Hope that helps

himanshu_rawat · ‎07-28-2016

I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?

scarroll · ‎07-28-2016

There were (and still are) a number of methods, including:

Throw data away
- Down Sample - Decide what you think is important up front and throw the rest away
- Age Off - Periodically delete old data
Warehouse - write old data to tapes and delete off the disks
Buy specialised hardware - Very large, expensive dedicated database machines which don't scale
Don't use a traditional database - keep everything in files and distribute manually to a cluster
Traditional database horizontal scaling - never done it but heard it's difficult

Apparently, Facebook still uses MySQL "with a complex sharding and caching strategy" - Gigacom

himanshu_rawat · ‎07-28-2016

Thanks Carroll

Cloudera Community

Support Questions

what is a structure data and unstructured data in more precise way