About bleonhardi

bleonhardi · ‎04-18-2016

Perhaps one thing in advance. Are you sure sure that there are reducers? Normally I would assume your job would just run with a LOT of mappers and no reducers at all. Especially since he goes from partitioned table to partitioned table this should be ok. ( It would be terrible from a randomly ordered input data set to a partitioned table but that's a different story. ) An explain of your query would be nice. ( i.e. run EXPLAIN INSERT INTO ... and see why he does the reducer ) How old is your hive? Initial loads in partitioned tables are always tricky, here is my shameless plug presentation: http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data If you set number of reducers SET MAPRED.REDUCE.TASKS = x; You should ALSO set the distribution of it otherwise one reducer will try to write to 1173 partitions which will kill it. INSERT .... SELECT * FROM xxx DISTRIBUTE BY DT; Regarding number of reducers? Check how much memory you need for each task ( for ORC I like 8GB ) and divide your space by it. Or choose 1173 if that fails. He might run them one after another but each would only write to one partition. other ways to fix it is to use set optimize.sort.dynamic.partitioning=true; Which should theoretically do everything for you but which is not as flexible. Also have a look into the presentation to do correct sorting and please enable Tez :-).

bleonhardi · ‎04-17-2016

How long are the blocks around? They will be cleaned up eventually and then changing the fsimage doesnt help you anymore right? But this article explains a lot about the way the namenode works. Very cool.

bleonhardi · ‎04-17-2016

cool thanks, that should speed things up.

bleonhardi · ‎04-16-2016

What would you want to speed up with tmpfs? Most components in the hadoop environment only use the discs for persistence ( or require A LOT of space on the datanodes ) So a tmpfs store defeats the purpose for something like an fsimage etc. The two components who aggressively use memory backed discs are Spark and Kafka. But both depend on OS filesystem buffers instead ( and tell the OS only to write through to disc if needed )

bleonhardi · ‎04-15-2016

All nodes: Enough Space for logs /var/log ( 100GB? ), Also enough space for /var and /usr At least the logs should have their own logical partition since its annoying when they run over. Namenode: Discs should be raided. Good best practice to keep a separate partition for the hadoop files ( /hadoop? ) A couple hundred GB should be sufficient. The disc requirements are not huge. DataNodes: 2 OS discs can be raided for additional resiliency. Or just one drive reserved for OS for more data drives. All other drives are data drives and should be non raided and simply added as simple volumes /grid/x ext4 is good for the data drives discs should be mounted with noatime. Some more details here: https://community.hortonworks.com/articles/14508/best-practices-linux-file-systems-for-hdfs.html Swap: in general disable swap on datanodes since swapping should REALLY not happen on the datanodes and would most likely kill cluster performance. Better to have tasks fail and someone to look at it On Master nodes its a bit more complex. Here depending on cluster size many tasks can be running and OOM errors can lead to unpredicable results. So swapping may be safe here. However make sure that yoy normally have enough space available. But you may find other recommendations as well.

bleonhardi · ‎04-15-2016

@Daniel Perry The below is the regex to get the date out there: [0-9]+(.*)[0-9]+ Now how to do that? Below is some pseudo code on how I once did something similar. Its a lot of trickery with the different exclamation marks ( notice how the `` is used to execute a command and the "" to denotate a string. You also have to escape characters a lot. So its not finished. You may have to play around with it a bit. But it should work. myvar=+----..........2016 .... ; datevar="`echo "$myvar" | sed 'myregex'`" The second possibility is to do that in oozie. Oozie supports the JSTL expression language functions and you could just use substring since the length of the before and after string always seems to be the same? http://beginnersbook.com/2013/12/jstl-substring-substringafter-substringbefore-functions/

bleonhardi · ‎04-15-2016

Basically JMS standard never delivers an acknowledged message twice. So yes each message goes to one flume agent. There is no replication in it and you need to make sure that agent doesn't have outages ( raided discs, file channel, ... ) MQ systems provides different ways to provide reliability for example Publish subscribe. But I don't think Flume supports that. http://www.ibm.com/support/knowledgecenter/#!/SSFKSJ_7.0.1/com.ibm.mq.amqnar.doc/ps20010_.htm There is also the possibility to duplicate each message to two topics. however in this case you need to do a deduplication somewhere in your ingest logic. ( Flume would not work here you would need to do that downstream when processing the messages )

bleonhardi · ‎04-14-2016

One thing to keep in mind is that your queries will fail if too many tasks fail. This can happen if one or some of your local dirs is on a small partition as well. Not sure about your cluster setup but sometimes ambari configures the local dirs simply by taking all available non root partitions and this can lead to these problems. I think its hard to believe that your query fails because yarn runs out of disc space given the small amount of data you have in the system. I think it is likelier that one of the local dirs has been set in a small partition. Check: yarn.nodemanager.local-dirs and see if one of the folders in there is in a small partition. You can simple change that and re run your jobs.

bleonhardi · ‎04-14-2016

"1. I read that HDFS has a default replication factor 3, meaning whenever a file is written, each block of the file is stored 3 times. Writing same block 3 times needs more I/O compared to writing once. How does Hadoop address this ? won't it be a problem when writing large datasets ?" In addition to what Kuldeep said: It is true that Hadoop has 3 times the IO in total compared to writing a file to a single file system. However a typical small cluster ( 10 datanodes each with 12 discs ) has 120 discs to write to and therefore roughly 40x the writing capacity and 120 times the reading capacity of a single disc. So even on a small cluster the IO throughput of HDFS is gigantic. "2. As HDFS is a Virtual Filesystem, the data it stores will ultimately stored on Underlying Operating system (Most cases, Linux). Assuming Linux has Ext3 File system (whose block size is 4KB), How does having 64MB/128 MB Block size in HDFS help ? Does the 64 MB Block in HDFS will be split into x * 4 KB Blocks of underlying Opertating system ?" Again small additional comment to what Kuldeep said. A block is just a Linux file so yes all the underlying details of ext3 or whatever still apply. The reason blocks are so big is not because of the storage on the local node but because of the central FS storage - To have a distributed filesystem you need to have a central FS Image that keeps track of ALL the blocks on ALL the servers in the system. In HDFS this is the Namenode. He keeps track of all the files in HDFS and all the blocks on all the datanodes. This needs to be in memory to support the high number of concurrent tasks and operations happening in an hadoop cluster so the traditional ways of setting this up ( btrees ) doesn't really work. It also needs to cover all nodes on all discs so he needs to keep track of thousands of nodes with tens of thousands of drives. Doing that with 4kb blocks wouldn't scale so the block size is around 128MB on most systems. ( You can count roughly 1GB of Namenode memory for 100TB of data ) - If you want to process a huge file in HDFS you need to run a parallel task on it ( MapReduce, Tez, Spark , ... ) In this case each task gets one block of data and reads it. It might be local or not. Reading a big 128 MB block or sending him over the network is efficient. Doing the same with 30000 4KB files would be very inefficient. That is the reason for the block size.

bleonhardi · ‎04-13-2016

You normally can select the nodes you want to install the clients on in the add servers section or when adding a service. So I don't think he should choose two nodes to install the clients. He might have picked the once You can later always add clients in the host page of ambari with the +Add button.

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: How do you force the number of reducers in a m...

Re: How to Recover Accidentally deleted file (with...

Re: What is Snakebite ? and How to use it with HDP...

Re: Any recommendation on how to partition disk sp...

Re: Any recommendation on how to partition disk sp...

Re: Capture output from Hive action and use that a...

Re: Multiple Flume Agents to fetch data from MQ me...

Re: HDFS- Non DFS space allocation/capacity

Re: Write performance in HDFS

Re: Spark Client Number