Support Questions
Find answers, ask questions, and share your expertise

Generate and load JSON for analyzing with R

New Contributor

I would like to create the following test case: getting an interesting amount of json file, for instance related to web site click events, load them into HDP using a sort of ETL and then make an exploratory analysis with R. To do this I need obviously json files ...is there a way to get an important number of json events (let's say 30gb) and then the best way to feed all of them into Hadoop of my sandbox.

Any ideas?

2 REPLIES 2

Re: Generate and load JSON for analyzing with R

@Stefano Cislaghi

Step 1) You need to have JSON source files. if you don't already have, here are few links for getting sample JSON data.

https://adobe.github.io/Spry/samples/data_region/JSONDataSetSample.html

https://catalog.data.gov/dataset?res_format=JSON

http://jsonstudio.com/resources/

Step 2) Read & Load data into HDFS

You may need to write JAVA code for reading and loading the data into hive

Step 3) install R

Step 4) install Rhive for connecting hive files from R( we can install ODBC/JDBC driver as well)

Step 4) install RStudio for web application on R ( Optional)

I hope this will help you.

Re: Generate and load JSON for analyzing with R

New Contributor

Thanks.The most interesting point is the number 2. I know it is possible also with python (that trigger an hdfs command). It could be interesting about the shortlist of possibilities to put JSON files into hadoop automatically.

I'll try. Thanks.