About fil

fil · ‎01-11-2016

Thank you for your reply! seems promissing, but as far as i understand it require to rebuld your Hadoop distribution pack. what if i just have CDH pack and want to plug this like extention (for example like lzo does through the parcels)... thanks!

fil · ‎01-09-2016

Hi experts! seems that LZMA algorithm could be pretty siutable for some Hadoop cases (like storing historical inmutable data). Does someone know is it possible to implement it somehow or reuse some library? any ideas are very welcome! thanks!

fil · ‎11-11-2015

Hi dear experts! i'm wondering is there any way to force block redistridution for some particular file/directory. my case is: 1) load file from node that have DataNode process with replication factor 1 2) increace replication factor by executing: hdfs dfs -setrep 3 /tmp/path/to/my/file 3) check distribution with some specific Java tool: hadoop jar FileDistribution.jar /tmp/path/to/my/file and got: ----------------------------------- ----------------------------------- Files distribution in directory across cluster is : {scaj31bda05.us.oracle.com=400, scaj31bda03.us.oracle.com=183, scaj31bda04.us.oracle.com=156, scaj31bda01.us.oracle.com=151, scaj31bda02.us.oracle.com=154, scaj31bda06.us.oracle.com=156} it's obvious that first node contain 400 blocks. other 400*2=800 blocks evenly distributed across other nodes. it there any way for force block redistribution for make it even? thanks!

fil · ‎11-08-2015

Thanks! Hdfs fsck will works, but it's hard to analyze in case of big file. maybe it is other way to get aggregate values?

fil · ‎11-08-2015

Hi dear expert! i'm wondering is there any way to check file distribution amond nodes in HDFS? some way that allow to check on which nodes place some particular file of dirrectory? thanks!

fil · ‎11-06-2015

Hi dear experts! i'm strugling with configuring sqoop2+hue(3.7)+Oracle DB. i'm trying to create connection in HUE, but getting error: i have ojdbc6.jar in /var/lib/sqoop2/ dirrectory (as hinted some forums): [root@sqoop2server ~]# ll /var/lib/sqoop2/ total 7684 -rw-r--r-- 1 sqoop2 sqoop 2677451 Nov 6 12:46 derby-10.8.2.2.jar -rw-r--r-- 1 sqoop2 sqoop2 960396 Nov 6 13:39 mysql-connector-java.jar -rw-r--r-- 1 root root 3670975 Nov 6 13:52 ojdbc6.jar -rw-r--r-- 1 sqoop2 sqoop2 539705 Nov 6 13:39 postgresql-9.0-801.jdbc4.jar drwxr-xr-x 3 sqoop2 sqoop2 4096 Nov 5 21:17 repository drwxr-xr-x 3 sqoop2 sqoop2 4096 Nov 2 19:06 repositoy drwxr-xr-x 5 sqoop2 sqoop 4096 Nov 6 13:39 tomcat-deployment + another one question is the any way to config Oraoop sqoop with Hue? thanks!

fil · ‎09-09-2015

thank you for your reply! Could you point me at source class where it's possible to read this in more details? thanks!

fil · ‎09-09-2015

thank you for your reply! just for clarify > stream the data via a buffered read does size of this buffer defined by io.file.buffer.size parameter? thanks!

fil · ‎09-08-2015

Hi dear experts! i'm curious how it possible to handle read IO size in my MR jobs. for exampe, i have some file in HDFS, under the hood it's files in Linux filesystem /disk1/hadoop/.../.../blkXXX. in ideal case this file size should be equal block size (128-256MB). my question is how it possible to set IO size for reading operation? thank you!

fil · ‎07-27-2015

but in the second case I read all dataset as in the first case (without any map operation). so, in both casese i read whole dataset... regarding shuffle - i use coalesce instead repartition, so it suppose to avoid shuffle operations...

Online	Offline
Last Visited	‎02-26-2020 04:56 PM

Member Since	‎09-17-2014 01:36 AM
Last Visited	‎02-26-2020 04:56 PM
Posts	88
Kudos received	3

Cloudera Community

Re: What does mean AverageThreadTokens in impala's...

Re: Spark's faill durring persist()

Re: LZMA compression codec support

LZMA compression codec support

Force block redistribution for some particular fil...

Re: File distribution on HDFS

File distribution on HDFS

Sqoop2 + HUE + Oracle DB configuration

Re: Hadoop read IO size

Re: Hadoop read IO size

Hadoop read IO size

Re: Benefit of DISK_ONLY persists