Member since
08-01-2017
19
Posts
0
Kudos Received
0
Solutions
07-27-2018
07:11 PM
@HDave Have you checked the following library: https://github.com/crealytics/spark-excel
... View more
08-01-2017
02:38 PM
Hi @Hardik Dave Cluster sizing and planning would require much more detail and in-depth conversation about the use case, the data sizing, etc. A good guide that can help you down the path of sizing your cluster can be found here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cluster-planning/bk_cluster-planning.pdf I would suggest using Ambari to manage your cluster where the memory allocation and settings/config will be much more visible in the UI for each technology/service being used in your cluster.
... View more
08-04-2017
01:20 PM
Hi Hdave , You should pickup one tool at a time. for example take Hive. Below are high level steps. 1. Upload your csv files from your local system to HDFS file system. Hint : Use PUT command 2. Launch hive - Hint - Beeline 3. Create hive table as per CSV columns. 4. Load CSV file into table. 5. Query table from HIVE CLI or Beeline. Once this is done , please pick up another tool and try same. Regards, Fahim
... View more
06-30-2017
05:07 PM
Thanks , it helped a lot to clear my confusion.
... View more