Member since
07-01-2016
26
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1675 | 02-22-2017 03:23 PM |
12-26-2017
06:22 AM
@Ashnee Sharma Can you please set hive.support.concurrency=false and then try to run select count(*) from rasdb.test;
... View more
12-22-2017
01:45 PM
if you don't mine, Can you place the query which you are using to get count
... View more
12-22-2017
01:40 PM
Better to maintain all the deleted id's in one staging table and while loading the data into hive tables check whether id is already existed in staging table either by using Join or Merge clause
... View more
08-31-2017
10:57 AM
Getting mismatched input 'as' expecting RIGHT_PAREN and also I want to get output as
(2017-06-0408:01:08,Receive,asd,voda2,usrCellfind)
... View more
08-30-2017
02:06 PM
Hi Friends, I have a data like below from log files 2017-06-04 08:01:08 Receive asd [SMSC:voda2] [SVC:usrCellfind]. Here Field Delimiter is [ ] and inside delimiter, the fields are like <<column name>>:<<value>>. Could you please let me know how to process these kinds of log file
... View more
Labels:
02-23-2017
01:45 PM
1 Kudo
@Sreeviswa Athikala Namenode doesn't have capabilities to merge both fsimage and edit logs. After setup the brand new hadoop cluster, namenode will have empty fsimage and one editlog file.All the changes are writing to edit log files and if namenode runs like this edit logs gets huge and in any case if namenode needs to restart then it will take longer time because more changes needs to be applied to the last state of the metadata. To avoid this Checkpoint service has launched and it will work based on below two configurations
dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints
or
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached. if any one of the above conditions met then immediately checkpoint service will happen.
here checkpoint nothing but merging editlog files and generating new FS image file and uploads it to Namenode. Like checkpoint node we have secondary node, the only difference is checkpoint node will directly upload fsimage to namenode where as secondary namenode doesn't have fsimage upload feature to namenode.
... View more
02-22-2017
03:23 PM
@satya gaurav Please find the details below: 1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers. ->By default each reducer will generate a separate output file like part-0000 and this output will be stored in HDFS. if we want to merge all the reducers output to single file, then explicitly we have write our own code using MultipleOutputs or using hadoop -fs getmerge command 2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?if yes then how? How container is assigning for map and reduce task? ->Yes map and reduce task run in the same container but not parallel. In mapredce V1 we have fixed mappers and reducer slots and we have to run map tasks in mapper slots and reducer tasks in reducer slots only. But in Mapreduce V2 we have option to run either map/reduce tasks in either mapper/reducer container. -> We can't run more than 1 map/reduce tasks at a time in the same container
... View more
09-19-2016
03:56 PM
@Jasper Below are my configurations at cluster level. it is still launching map job when I run SELECT * FROM tablename;
... View more
09-19-2016
03:53 PM
@Jasper, Split size is not equivalent to block size. Split size is configurable and its advisable that split size should be greater than block size and splits will always be done for reducing the no.of mapper tasks.
... View more
09-19-2016
02:39 PM
Yes, if all the nodes are busy or down state, your node manager will launch the container in client node and map tasks will read the data remotely from any available data node and then process it. Finally output will go back to the data node which is available. But it's not advisable to configure node manager in client node
... View more
09-19-2016
11:05 AM
1 Kudo
I have loaded 1 GB file to HDFS and then created hive table on top of this. Details: Block size =2MB (Here we have configured block size as 2Mb for the sake of checking these kind of scenarios) Split size=128 Mb When I fire a SELECT * FROM tablename, I see 9 mapper jobs are launched. I have read many places like there will not be any map jobs for select * from table. Could some one explain why map jobs are launched in this case
... View more
- Tags:
- Data Processing
- Hive
Labels:
09-14-2016
02:54 PM
1. Why Secondary namenode is explicitly copying Fsimage from Primary name node when secondary name node is having the same copy of FS image as primary has? 2. Initially when cluster is setup will it be having any fsimage at primary node if yes will it contains any data 3. Looks like both primary name node and secondary name node are maintaining all the transaction logs? Is it required to maintain same logs in both locations? if yes, How many old transactions that we have to keep in cluster? is there any configuration for this
... View more
Labels:
08-16-2016
01:02 PM
1 Kudo
How can we load other than dbo schema tables into Hive by using hive import-all-tables command.
... View more
Labels:
08-10-2016
04:08 PM
Hi All, I am trying to load data as sequence file in Hive and for this first I loaded as textfiles and from there I have loaded as sequence file. But when I check the file size of sequence file, it is same as the sum of all the text files size. Here my questions are 1. Do we need to manually delete all those text files? 2. Or any job will be take care of cleaning these files? One more thing is how can we do UPDATE/DELETE operations in Hive. I have set all the required properties like below hive.support.concurrency
– true hive.enforce.bucketing – true hive.exec.dynamic.partition.mode – nonstrict hive.txn.manager
–org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on – true hive.compactor.worker.threads – 1 and then I created a table with Buckets and TBLPROPERTIES('transactional'='true'); but when I do update on this table I am getting below error FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. could you please help me in resolving this issue
... View more
- Tags:
- Data Processing
- Hive
Labels:
08-04-2016
03:35 PM
1 Kudo
@Josh Persinger, If we specify -m [1 or n], then it's always launch the number of map tasks which we specified with -m option. If we didn't specify any thing like -m 1 then it will launch by default 4 mapper tasks
... View more
08-01-2016
02:25 PM
1. I have read many places like "HiveServer cannot handle concurrent requests from more than one client" and hence they released Hiverserver2? So, what exact problems are there in Hiveserver1 and how they have resolved in Hiverserver2 2. What is hive thrift client and hive thrift server?
... View more
Labels:
07-22-2016
12:58 PM
1. Per my knowledge each mapper output, initially write into buffer. If buffer reaches 80% of its memory then it will push into disk and while pushing into disk Partitioning and sorting will be happen. Here my question is if we are partitioning what is the reason for doing sorting? 2. In "Shuffle and Sort" phase also sorting will be done. Any reason for sorting here? 3. In reducer phase also sorting will be done? Is there any reason that we are giving sorted output to end user? I have seen like nearly 3 times we are doing sorting and Sorting is too costly operation. Can some one help in understanding the reason for doing these many times sorting
... View more
Labels:
07-22-2016
05:22 AM
Hi Sujitha, I am not satisfying with the answers which you provided 1. Without executing the user-code how Application master will come to know how much of resource its required? 2. On what basis Resource manager will take care of resource allocation
... View more
07-21-2016
01:10 PM
1 Kudo
1. On what basis Application Master decides that it needs more containers? 2. Will each mapper have separate containers? 3. Let's say one mapper launched in container and mapper completed 20% of work and if it requires more resources to complete remaining 80% of the task then how the resources will be allocated and who will allocate? If Distribution happens between containers then how it will happen?
... View more
Labels:
07-19-2016
03:33 PM
How to view the below block files data. blk_1073742080 blk_1073742080_1256.meta
... View more
- Tags:
- block
- Data Processing
07-18-2016
12:49 PM
1. List out the Metadata attributes in Hadoop 2. Can we see the block level metadata file? if Yes, how can we see that file
... View more
Labels:
07-14-2016
02:36 PM
1. How Reducers know where the mapper results are stored
... View more
Labels:
07-04-2016
05:29 AM
Will replication starts after loading the first block of data
... View more
07-01-2016
06:31 PM
Ravi, Thanks for your response. I agree that files are divided into blocks based on the block size, however I wanted to know that who actually split the file into blocks while loading the data into HDFS
... View more
07-01-2016
02:38 PM
Many places I have read like Client will split the data into blocks while storing the data into HDFS. Could you please let me know how actually files will be divided into blocks
... View more
- Tags:
- Hadoop Core
- spliting