About bleonhardi

bleonhardi · ‎08-15-2016

Linkedin? There is only one Benjamin Leonhardi there

bleonhardi · ‎08-12-2016

@jovan karamacoski I think you might want to contact us for a services engagement. I strongly suspect that what you want to achieve and what you asking about are not compatible. On hadoop normally some files will be hot not specific blocks. And files will be per definition widely distributed across nodes. So moving specific "hot" drives will not make you happy. Also esp. If you write having some nodes with more network than others doesn't sound like a winning combination. Since slow nodes will be a bottleneck and it's all linked together. That's how hdfs works. If you want some files to be faster you might want to look at hdfs storage tiering. Using that you could put "hot" data on fast storage like ssds. You could also look at node labels to put specific applications on fast nodes with lots of cpu etc. But moving single drives ??? That will not make you happy. Per definitely hdfs will not care. One balancer later and all your careful planning is gone. Oh and lastly there is no online move of data nodes. You always need to stop a data node change the storage layout and start it again. It will send the updated block report to the Namenode.

bleonhardi · ‎08-12-2016

just use doAs=true make sure only hive can read the warehouse folder and you are done. Hive cli can start but not access anything

bleonhardi · ‎08-12-2016

"If it runs in the Appmaster, what exactly are "the computed input splits" that jobclient stores into HDFS while submitting the Job ??" InputSplits are simply the work assignments of a mapper. I.e. you have the inputfolder /in/file1 /in/file2 And assume file1 has 200MB and file2 100MB ( default block size 128MB ) So the InputFormat per default will generate 3 input splits ( on the appmaster its a function of InputFormat) InputSplit1: /in/file1:0:128000000 InputSplit2: /in/file1:128000001:200000000 InputSplit3:/in/file2:0:100000000 ( per default one split = 1 block but he COULD do whatever he wants. He does this for example for small files where he uses MultiFileInputSplits which span multiple files ) "And how map works if the split spans over data blocks in two different data nodes??" So the mapper comes up ( normally locally to the block ) and starts reading the file with the offset provided. HDFS by definition is global and if you read non local parts of a file he will read it over the network but local is obviously more efficient. But he COULD read anything. The HDFS API makes it transparent. So NORMALLY the InputSplit generation will be done in a way that this does not happen. So data can be read locally but its not a necessary precondition. Often maps are non local ( you can see that in the resource manager ) and then he can simply read the data over the network. The API call is identical. Reading an HDFS file in Java is the same as reading a local file. Its just an extension to the Java FileSystem API.

bleonhardi · ‎08-12-2016

Hello jovan, Yes you can simply move a folder. Data nodes are beautifully simple that way. We just did it on our cluster. Stop hdfs, copy the folder to a new location and change the location in the ambari configuration. just try it with a single drive on a single node ( using ambari groups) ( you can do an hadoop fsck / to check for under replicated blocks after the test). A single drive will not lead to inconsistencies in any case. In general data nodes do not care where the blocks are as long as they still find the files with the right block id in the data folders. You can theoretically do it on a running cluster but you need to use ambari groups do it one server at a time and make sure you do it quickly so Namenode doesn't start to schedule large number of replica additions because of the missing data node ( hdfs waits a biy before it fixes under replication in case a data node just reboots)

bleonhardi · ‎08-11-2016

Not sure what you mean. Do you want to know WHY blocks get under replicated? There are different possibilities for a block to vanish but by and large its simple: a) The block replica doesn't get written in the first place This happens during network or node failure during a write. HDFS will still return the write of a block as successful as long as one of the block replica writes was successful . So if for example the third designated datanode dies during the write process the write is still successful but the block will be under replicated. The write process doesn't care and they depend on the Namenode to schedule a copy later on. b) The block replicas get deleted later. That can have lots of different reasons. Node dies, drive dies, you delete a block file in the drive. Blocks after all are simple bog standard Linux files with a name blkxxxx which is the block id. They can also get corrupted ( HDFS does CRC checks regularly and blocks that are corrupted will be replaced with a healthy copy. And many more ... So perhaps you should be a bit more specific with your question?

bleonhardi · ‎08-11-2016

The namenode has a list of all the files blocks and block replicas in memory. A gigantic hashtable. Datanodes send block reports to it to give it an overview of all the blocks in the system. Periodically the namenode checks if all blocks have the desired replication level. If not it schedules either block deletion ( if the replication level is too high which can happen if a node crashed and was re added to the cluster ) or block copies.

bleonhardi · ‎08-09-2016

No the expunge should happen immediately, although HDFS may take a bit till the datanodes actually get around to delete the files but it shouldn't take long. So expunge doesn't help? Weird 🙂

bleonhardi · ‎08-09-2016

You see that line: 16/08/0909:16:13 INFO fs.TrashPolicyDefault:Namenode trash configuration:Deletion interval =360 minutes,Emptier interval =0 minutes. Per default HDFS uses a trash. You can bypass this with rm -skipTrash or just delete the trash with hadoop fs -expunge

bleonhardi · ‎08-08-2016

Gopal and me gave a couple of tips in here to increase the parallelity ( since Hive is normally not tuned for cartesian joins and creates too few mappers ). https://community.hortonworks.com/questions/44749/hive-query-running-on-tez-contains-a-mapper-that-h.html#comment-45388 Apart from that my second point still holds you should create some pre-filtering to reduce the amount of points you need to compare. There are a ton of different ways to do this: https://en.wikipedia.org/wiki/Spatial_database#Spatial_index You can put points in grids and make sure that a data point in one grid entry cannot be closer to any point of the other grid entry than your max distance for example.

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: What is the procedure for re-replication of lo...

Re: What is the procedure for re-replication of lo...

Re: How to block Hive CLI access?

Re: where an when does the fileinputformat() runs...

Re: What is the procedure for re-replication of lo...

Re: What is the procedure for re-replication of lo...

Re: What is the procedure for re-replication of lo...

Re: HDFS Space not reclaimed

Re: HDFS Space not reclaimed

Re: We perform frequently Cartesian products invo...