About shivanageshch

shivanageshch · ‎09-22-2021

How to enable the SSL for livy server in EMR. Can we use KMS certificate for this or is there any other option.

shivanageshch · ‎09-19-2016

I have tested it that we can run the jobs on nodes where there is no data node daemon running and is configured as a edge node. correct me if i am wrong.

shivanageshch · ‎09-19-2016

if i configure my edge node and not as data node i cannot store data in that datanode . But can i configure node manager on edge node and can i bring the data to the edge node and run the task if all other nodes are busy??

shivanageshch · ‎09-19-2016

@Rushikesh Deshmukh What is the purpose of merging the tables used in joins ?? can you please explain??

shivanageshch · ‎09-16-2016

1)Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has? There is no guaranty that the fs image in secondary namenode will be exactly same as that in Primary namenode. During checkpoint period of time , there may happen any corruption of data or any crashes and data loss. Its better to get the latest available data from Primary namenode and then merge the editlogs. 2) Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data. Yes, When a new namenode is setup in a new cluster it will have a FSimage with no data in it with file name like Fsimage_000000000 representing no transactions. 3) Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this By default HDFS stores till the transactions count reaches 1 million. Files which are storing transaction logs greater than 1 million are removed from HDFS.

shivanageshch · ‎08-04-2016

Am not familiar with spark. But looks like it has some functions to meet your requirement. http://stackoverflow.com/questions/36436020/converting-csv-to-orc-with-spark

shivanageshch · ‎08-04-2016

@Benjamin Leonhardi As per YARN appMaster is a mere code. So am unable to figure out how the new DAG can be submitted to existing AppMaster written to handle some other DAG.

shivanageshch · ‎08-04-2016

Thank you @Shiv kumar

shivanageshch · ‎08-04-2016

So, the handshake between client and AppMaster in YARN(which decommissions once job is done) is continued here in a Tez session. and client submits new DAGs directly to AppMaster and resource manager thinks its still the same application running , so the DAGs run with same application id. Correct me if i am wrong.

shivanageshch · ‎08-03-2016

Hi @ARUN The main reason might be the data blocks needed for the MapReduce job to run are located in those two nodes itself. Can you please check the data blocks of the file you are processing and verify that the data is distributed in 3 nodes. Speculative execution( case when your nodes are too busy running the tasks then the data can be moved temporarily to the third node and run the task.) also not be happening.

Online	Offline
Last Visited	‎09-23-2021 03:00 AM

Member Since	‎07-04-2016 05:55 AM
Last Visited	‎09-23-2021 03:00 AM
Posts	40
Kudos received	5

Cloudera Community

Re: Why Secondary namenode is explicitly copying F...

SSL enabling for Livy-server on EMR

Re: Can i use edge nodes for mapreduce??

Can i use edge nodes for mapreduce??

Re: What is Sort Merge Bucket (SMB) Join in Hive? ...

Re: Why Secondary namenode is explicitly copying F...

Re: How to load CSV file directly into Hive ORC ta...

Re: How new DAGs are submitted to existing Tez App...

Re: How new DAGs are submitted to existing Tez App...

Re: How new DAGs are submitted to existing Tez App...

Re: Load not distributed in the cluster