Member since
07-04-2016
40
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1270 | 09-16-2016 05:31 AM |
09-22-2021
11:40 PM
How to enable the SSL for livy server in EMR. Can we use KMS certificate for this or is there any other option.
... View more
Labels:
- Labels:
-
Apache Zeppelin
09-19-2016
02:07 PM
I have tested it that we can run the jobs on nodes where there is no data node daemon running and is configured as a edge node. correct me if i am wrong.
... View more
09-19-2016
12:05 PM
if i configure my edge node and not as data node i cannot store data in that datanode . But can i configure node manager on edge node and can i bring the data to the edge node and run the task if all other nodes are busy??
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
09-19-2016
05:00 AM
@Rushikesh Deshmukh What is the purpose of merging the tables used in joins ?? can you please explain??
... View more
09-16-2016
05:31 AM
1 Kudo
1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has? There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs. 2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data. Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions. 3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.
... View more
08-04-2016
02:17 PM
Am not familiar with spark. But looks like it has some functions to meet your requirement. http://stackoverflow.com/questions/36436020/converting-csv-to-orc-with-spark
... View more
08-04-2016
09:21 AM
@Benjamin Leonhardi As per YARN appMaster is a mere code. So am unable to figure out how the new DAG can be submitted to existing AppMaster written to handle some other DAG.
... View more
08-04-2016
09:18 AM
Thank you @Shiv kumar
... View more
08-04-2016
04:58 AM
So, the handshake between client and AppMaster in YARN(which decommissions once job is done) is continued here in a Tez session. and client submits new DAGs directly to AppMaster and resource manager thinks its still the same application running , so the DAGs run with same application id. Correct me if i am wrong.
... View more
08-03-2016
01:00 PM
Hi @ARUN The main reason might be the data blocks needed for the MapReduce job to run are located in those two nodes itself. Can you please check the data blocks of the file you are processing and verify that the data is distributed in 3 nodes. Speculative execution( case when your nodes are too busy running the tasks then the data can be moved temporarily to the third node and run the task.) also not be happening.
... View more