- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
what is a structure data and unstructured data in more precise way
- Labels:
-
Hortonworks Data Platform (HDP)
Created ‎07-28-2016 07:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sorry for theSilly question but I am new to HIve and BIG data world :can any one explain with neat example what is considered as structured and what is considered as unstructured if we compare to the RDBMS
Created ‎07-28-2016 09:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?
Created ‎07-28-2016 09:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Himanshu Rawat,
Welcome to HCC!
Whether we class data as structured or unstructured is related to its degree of organization. For example, consider the content and metadata of email.
The metadata associated with the emails I have sent would be structured. It needs to be very organized so the email servers know the sender, recipient(s), CC, BCC, time sent/received, etc. For example, the time received can easily be compared to the time on other emails. I could easily sort my emails based on time and find the most recent or something from a particular date.
The content or body on the other hand would be considered unstructured. I could put anything in there. How would I organize emails if I only considered the content? Number of words? Spaces? Positivity of the post? What would it mean?
Hope that helps
Created ‎07-28-2016 09:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I Agree with your answer @Carroll but it arised one more question then before big data came into picture how facebook or any other media was doing the processing of big data and unstructured data with the RDBMS?
Created ‎07-28-2016 10:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There were (and still are) a number of methods, including:
- Throw data away
- Down Sample - Decide what you think is important up front and throw the rest away
- Age Off - Periodically delete old data
- Warehouse - write old data to tapes and delete off the disks
- Buy specialised hardware - Very large, expensive dedicated database machines which don't scale
- Don't use a traditional database - keep everything in files and distribute manually to a cluster
- Traditional database horizontal scaling - never done it but heard it's difficult
Apparently, Facebook still uses MySQL "with a complex sharding and caching strategy" - Gigacom
Created ‎07-28-2016 12:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Carroll
