About gsrao_cse

gsrao_cse · ‎12-26-2017

@Ashnee Sharma Can you please set hive.support.concurrency=false and then try to run select count(*) from rasdb.test;

gsrao_cse · ‎12-22-2017

if you don't mine, Can you place the query which you are using to get count

gsrao_cse · ‎12-22-2017

Better to maintain all the deleted id's in one staging table and while loading the data into hive tables check whether id is already existed in staging table either by using Join or Merge clause

gsrao_cse · ‎02-23-2017

@Sreeviswa Athikala Namenode doesn't have capabilities to merge both fsimage and edit logs. After setup the brand new hadoop cluster, namenode will have empty fsimage and one editlog file.All the changes are writing to edit log files and if namenode runs like this edit logs gets huge and in any case if namenode needs to restart then it will take longer time because more changes needs to be applied to the last state of the metadata. To avoid this Checkpoint service has launched and it will work based on below two configurations dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints or dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached. if any one of the above conditions met then immediately checkpoint service will happen. here checkpoint nothing but merging editlog files and generating new FS image file and uploads it to Namenode. Like checkpoint node we have secondary node, the only difference is checkpoint node will directly upload fsimage to namenode where as secondary namenode doesn't have fsimage upload feature to namenode.

gsrao_cse · ‎02-22-2017

@satya gaurav Please find the details below: 1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers. ->By default each reducer will generate a separate output file like part-0000 and this output will be stored in HDFS. if we want to merge all the reducers output to single file, then explicitly we have write our own code using MultipleOutputs or using hadoop -fs getmerge command 2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?if yes then how? How container is assigning for map and reduce task? ->Yes map and reduce task run in the same container but not parallel. In mapredce V1 we have fixed mappers and reducer slots and we have to run map tasks in mapper slots and reducer tasks in reducer slots only. But in Mapreduce V2 we have option to run either map/reduce tasks in either mapper/reducer container. -> We can't run more than 1 map/reduce tasks at a time in the same container

gsrao_cse · ‎09-19-2016

@Jasper Below are my configurations at cluster level. it is still launching map job when I run SELECT * FROM tablename;

gsrao_cse · ‎09-19-2016

@Jasper, Split size is not equivalent to block size. Split size is configurable and its advisable that split size should be greater than block size and splits will always be done for reducing the no.of mapper tasks.

gsrao_cse · ‎09-19-2016

Yes, if all the nodes are busy or down state, your node manager will launch the container in client node and map tasks will read the data remotely from any available data node and then process it. Finally output will go back to the data node which is available. But it's not advisable to configure node manager in client node

gsrao_cse · ‎09-19-2016

I have loaded 1 GB file to HDFS and then created hive table on top of this. Details: Block size =2MB (Here we have configured block size as 2Mb for the sake of checking these kind of scenarios) Split size=128 Mb When I fire a SELECT * FROM tablename, I see 9 mapper jobs are launched. I have read many places like there will not be any map jobs for select * from table. Could some one explain why map jobs are launched in this case

gsrao_cse · ‎09-14-2016

1. Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has? 2. Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data 3. Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this

Online	Offline
Last Visited	‎12-26-2017 12:18 PM

Member Since	‎07-01-2016 02:32 PM
Last Visited	‎12-26-2017 12:18 PM
Posts	26
Kudos received	5

Cloudera Community

Re: How multiple reducer writing the output ? can ...

Re: select count query taking more time

Re: select count query taking more time

Re: Deletion of records from hive table based on s...

Re: checkpoint node vs Secondary namenode

Re: How multiple reducer writing the output ? can ...

Re: Why Map job is launched when I run SELECT * FR...

Re: Why Map job is launched when I run SELECT * FR...

Re: Can i use edge nodes for mapreduce??

Why Map job is launched when I run SELECT * FROM t...

Why Secondary namenode is explicitly copying FSima...