About HDave113

HDave113 · ‎07-27-2018

I want spark api which can write data into excel file not in CSV file. Best solution to write data into excel file directly.

HDave113 · ‎08-01-2017

Can somebody tell me what would be the real time cluster configuration? I have setup Hortonworks in my home system however it is standalone in real time project what would be the cluster configuration like how many nodes, Cluster memory & RAM, Node memory & RAM, backup of cluster and all? And while submitting Spark job to YARN how can we decide executors, memory and all those properties?

HDave113 · ‎07-10-2017

First of all Thanks Geoffrey for your quick response, hope I have addressed you name correctly. Suppose I have one CSV file, I want to process it through SPARK , I submit it on YARN and I need data to be loaded in HIVE tables . In this case where would I write my spark code(I will code write in eclipse however on which machine?), how would I submit it on YARN and How would I access my hive tables, all components would be distributed? or SPARK and HIVE would be on same node? If they are on same node then why do we need other 3 data nodes if one edge node can do all stuff

HDave113 · ‎07-10-2017

Hello, I have trained myself on hadoop, I know how to work with MR,Pig,HIVE,SPARK,SCALA,SQOOp and all however I worked on all these components in my personal system and in singlenode architecture. Now I need to know that how real time LIVE project works? How multi node structure works? If I am trying to process one CSV file then How do I access spark and hive and all which are installed on different nodes? And How do I access those? I need detailed documents if somebody have or any article that anyone is aware of which shows complete steps and process to access different components. I feel helpless as nobody in my group or in my connection works on real time Hadoop ecosystem

HDave113 · ‎06-30-2017

Thanks , it helped a lot to clear my confusion.

HDave113 · ‎06-29-2017

I have gone through below URL to understand how to load data into HIVE using spark in orc format. I understood how to create table in HIVE using spark howvere I have one question that how would spark identify that in which database this table should be created or if I have same table name in two different HIVE DB in which table spark is going to insert values I have gone through below URL: https://hortonworks.com/tutorial/using-hive-with-orc-from-apache-spark/

Online	Offline
Last Visited	‎08-01-2017 05:44 AM

Member Since	‎08-01-2017 02:23 AM
Last Visited	‎08-01-2017 05:44 AM
Posts	19

Cloudera Community

Best spark Scala API to write data into excel file

Cluster configuration

Re: How to access different nodes where different ...

How to access different nodes where different hadd...

Re: Loda data into hive using spark howvere how do...

Loda data into hive using spark howvere how does s...