About fil

fil · ‎10-08-2018

hi experts! in HDFS serivices there is tool called balancer, which purposed to ensure even distribution of blocks across cluster. my question is how frequently it kicks in to check is cluster imbalanced or not? is there any way to change this frequency? thanks!

fil · ‎10-05-2018

awesome! thank you!

fil · ‎10-05-2018

@Fawze thanks for script... unfortunitely it reflects some error for me: ./users_resource_cons.sh: line 14: syntax error: unexpected end of file

fil · ‎10-05-2018

Thank you Thomas! do you mean some concreate charts? 🙂 I checked Cloudera Manager -> YARN -> Resource Pools there indeed lots of useful charts, but it shows pool consumption. for example it could be pool root.marketing, but within thi pool it could be multiple users. so, i want to have a understanding which users consume which resources.

fil · ‎10-04-2018

hi dear experts! i do have a challenge. i do have a dynamuc service pool, let's say root.marketing. Many users, who belong to this pool is submitting jobs on it (Bob, Alice, Tom). i want to know resource consumption for each of the users. like for the last day Bob used in average 33 cores, Alice 12, Tom 118... or something like this. in other words want to know who consume what within the same pool thanks!

fil · ‎04-19-2016

Hi dear experts! i'm trying to load data with ImportTSV tool , like this: hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dmapreduce.job.reduces=1000 -Dimporttsv.columns="data:SS_SOLD_DATE_SK, HBASE_ROW_KEY" -Dimporttsv.separator="|" -Dimporttsv.bulk.output=/tmp/store_sales_hbase store_sales /user/root/benchmarks/bigbench/data/store_sales/*, but have only one reducer (despite on -Dmapreduce.job.reduces=1000 setting). i even set mapreduce.job.reduces=1000 on the Cluster wide, but still have only one reducer. Could anybody hint how to resolve this? thank you in advance for any input!

fil · ‎04-19-2016

Hi dear experts! i'm trying to load data from CSV format on HDFS to HBase with ImportTSV (importtsv). it works perfectly fine in case when HBASE_ROW_KEY is the single CSV column. but i don't know how to create composite HBASE_ROW_KEY (from two columns). for example, i have CSV with 3 columns: row1, 1, abc row1, 2, dd row2, 1, iop row3, 1, kk and row could be uniqly identified by first two columns. any inputs will be highly appreciated!

fil · ‎01-22-2016

Hi dear expert! i'm trying to export data with sqoop.export.records.per.statement parameter. But for some reasons sqoop doesn't recognize it: sqoop export --direct --connect jdbc:oracle:thin:@scaj43bda01:1521:orcl --username bds --password bds --table orcl_dpi --export-dir /tmp/dpi --input-fields-terminated-by ',' --lines-terminated-by '\n' -m 70 --batch -Dsqoop.export.records.per.statement=10000 -Dsqoop.export.statements.per.transaction=100 Warning: /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p1168.923/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/22 19:59:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.1 16/01/22 19:59:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/01/22 19:59:38 ERROR tool.BaseSqoopTool: Error parsing arguments for export: 16/01/22 19:59:38 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dsqoop.export.records.per.statement=10000 16/01/22 19:59:38 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dsqoop.export.statements.per.transaction=100 i've tried to remove --direct key (target DB is Oracle), but it also doesn't help: sqoop export --connect jdbc:oracle:thin:@host:1521:orcl --username user --password pass --table orcl_dpi --export-dir /tmp/dpi --input-fields-terminated-by ',' --lines-terminated-by '\n' -m 70 --batch -Dsqoop.export.records.per.statement=10000 -Dsqoop.export.statements.per.transaction=100 Warning: /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p1168.923/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 16/01/22 20:00:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.5.1 16/01/22 20:00:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 16/01/22 20:00:29 ERROR tool.BaseSqoopTool: Error parsing arguments for export: 16/01/22 20:00:29 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dsqoop.export.records.per.statement=10000 16/01/22 20:00:29 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dsqoop.export.statements.per.transaction=100 thank you!

fil · ‎01-13-2016

so, i've started to play with this and met interesting thing. When I try to proceed data with lzma i read in two times more data then i'm actually have on the HDFS. For example, hadoop client (hadoop fs -du) shows some numbers like 100GB. then i run MR (like select count(1) ) over this data and check MR counters and find "HDFS bytes read" two times more (like 200GB). In case of gzip and bzip2 codecs hadoop client file size and MR counters are the similar

fil · ‎01-12-2016

many thanks!

Online	Offline
Last Visited	‎02-26-2020 04:56 PM

Member Since	‎09-17-2014 01:36 AM
Last Visited	‎02-26-2020 04:56 PM
Posts	88
Kudos received	3

Cloudera Community

Re: What does mean AverageThreadTokens in impala's...

Re: Spark's faill durring persist()

How frequent kicks HDFS balancer

Re: How to define concrete resource consumption fo...

Re: How to define concrete resource consumption fo...

Re: How to define concrete resource consumption fo...

How to define concrete resource consumption for ce...

HBase increase num of reducers for bulk loading wi...

HBase: Composite key for ImportTsv

sqoop.export.records.per.statement parameter doesn...

Re: LZMA compression codec support

Re: LZMA compression codec support