About fil

Harsh J · ‎10-08-2018

The command in CM -> HDFS -> Actions to run Balancer is ad-hoc. There's no schedule it runs by - you'll need to invoke it manually to trigger the HDFS Balancer work. If you'd like to setup a frequency, you can use the CM API to trigger it via crontab/etc.

fil · ‎10-05-2018

awesome! thank you!

Harsh J · ‎11-02-2017

You are right that its all just byte sequences to HBase, and that it sorts everything lexicographically. You do not require a separator character when composing your key for HBase to understand them as boundaries (cause it would not serve as one), unless you prefer the extra bytes for better readability or for recovering back the individual data elements from (variable length) keys if that's a use-case. HBase 'sharding' (splitting) can be manually specified at table create time if you are aware of your key pattern and ranges - this is strongly recommended to scale from the beginning. Otherwise, HBase computes key midpoints by analysing them in byte form and splits them based on that, whenever a split size threshold is reached for a region range.

Dautkhanov · ‎11-20-2016

https://github.com/Parquet/parquet-format/blob/f7ab552f569df63bdb59f751d0dd36e826682739/src/thrift/parquet.thrift#L338 Index pages are declared in Parquet format, but not actually implemented. See code above.

AutoIN · ‎04-21-2016

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions ) Example: When the table is pre-splitted with 6 regions: hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1 ... .... Job Counters Launched map tasks=1 Launched reduce tasks=6 <<<

PavasG · ‎01-25-2016

According to the Sqoop documentation, it uses the generic Hadoop parameter passing scheme, which uses a space after the -D parameter denominator. You can check this at the sqoop site here: http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_using_options_files_to_pass_arguments

fil · ‎01-13-2016

so, i've started to play with this and met interesting thing. When I try to proceed data with lzma i read in two times more data then i'm actually have on the HDFS. For example, hadoop client (hadoop fs -du) shows some numbers like 100GB. then i run MR (like select count(1) ) over this data and check MR counters and find "HDFS bytes read" two times more (like 200GB). In case of gzip and bzip2 codecs hadoop client file size and MR counters are the similar

nitin · ‎11-25-2015

When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.

mkwang · ‎11-10-2015

When Sqoop2 starts, it copies /usr/share/java/oracle-connector-java.jar to /var/lib/sqoop2/. Please rename ojdbc6.jar to oracle-connector-java.jar, change the owner and group to sqoop2 and restart Sqoop2. Also, please use oracle.jdbc.OracleDriver as the JDBC Driver Class.

Harsh J · ‎11-08-2015

If by 'hard to analyse' you mean to parse/process it, you can consider using the Java API to fetch block location info too: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

Online	Offline
Last Visited	‎02-26-2020 04:56 PM

Member Since	‎09-17-2014 01:36 AM
Last Visited	‎02-26-2020 04:56 PM
Posts	88
Kudos received	3

Cloudera Community

Re: What does mean AverageThreadTokens in impala's...

Re: Spark's faill durring persist()

Re: How frequent kicks HDFS balancer

Re: How to define concrete resource consumption fo...

Re: HBase: Composite key for ImportTsv

Re: Parquet index page for Impala

Re: HBase increase num of reducers for bulk loadin...

Re: sqoop.export.records.per.statement parameter d...

Re: LZMA compression codec support

Re: Force block redistribution for some particular...

Re: Sqoop2 + HUE + Oracle DB configuration

Re: File distribution on HDFS