I would like to create the following test case: getting an interesting amount of json file, for instance related to web site click events, load them into HDP using a sort of ETL and then make an exploratory analysis with R. To do this I need obviously json files ...is there a way to get an important number of json events (let's say 30gb) and then the best way to feed all of them into Hadoop of my sandbox.
Thanks.The most interesting point is the number 2. I
know it is possible also with python (that trigger an hdfs command). It
could be interesting about the shortlist of possibilities to put JSON
files into hadoop automatically.