I am quite new to Hadoop with only 3 weeks of learning. I am posting the below questions, can you please guide :
Things done until now :
1. I have Cloudera Hadoop running on AWS (ubuntu 14).
2. Have successfully installed Cloudera manager and a cluster is also created. The web link for CM and HUE are functioning good.
1. I want to load some sample .csv file data to HDFS. How can I do the same ? Is there any tutorials for loading the data to HDFS.
2. Can I use HUE to load data and use HIVE to build tables on the top of it.
One way to get started is to use Hue's examples. Log into Hue as a superuser and the front page will have a series of steps listed. "Step 2" is Examples. Clicking on Hive will isntall example data with Hive tables. The others will also do so.
You can also Click on Data Browsers --> Metastore Tables
From there you can import files to create Hive tables.
Depending on what you are looking to test, there may be other useful resources but the above are pretty quick and simple.
Others may have example data they use. Many datasets are available online.
One we commonly use in training is movie data:
other grouplens datasets: https://grouplens.org/datasets/
Many thanks. It was really useful for beginners like me.
Do you have any idea how can i access this data from other reporting tools. In our case, we use BIRST reporting tool. However I am not able to establish connection to HIVE tables.
Is there any link, which I can refer to establish the JDBC.