Hey Guys! Is there any website/way to get bulk of raw data (approx 50-100TB) for Hadoop POC? Can we get historical data from Twitter feed? No specific use-case as of now, just wanted to check if we can download this much amount of data.
https://github.com/caesar0301/awesome-public-datasets should fit the bill - it's got large amounts of data from almost every field you can think of. 50-100TB is an awful lot of data though, especially to download. Have you considered writing a script to automatically generate the data locally given a smaller set of input data?