Member since
09-17-2014
88
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2667 | 07-15-2015 08:57 PM | |
9420 | 07-15-2015 06:32 PM |
10-08-2018
07:19 PM
1 Kudo
The command in CM -> HDFS -> Actions to run Balancer is ad-hoc. There's no schedule it runs by - you'll need to invoke it manually to trigger the HDFS Balancer work. If you'd like to setup a frequency, you can use the CM API to trigger it via crontab/etc.
... View more
10-05-2018
07:47 AM
awesome! thank you!
... View more
11-02-2017
04:39 AM
1 Kudo
You are right that its all just byte sequences to HBase, and that it sorts everything lexicographically. You do not require a separator character when composing your key for HBase to understand them as boundaries (cause it would not serve as one), unless you prefer the extra bytes for better readability or for recovering back the individual data elements from (variable length) keys if that's a use-case. HBase 'sharding' (splitting) can be manually specified at table create time if you are aware of your key pattern and ranges - this is strongly recommended to scale from the beginning. Otherwise, HBase computes key midpoints by analysing them in byte form and splits them based on that, whenever a split size threshold is reached for a region range.
... View more
11-20-2016
10:46 PM
https://github.com/Parquet/parquet-format/blob/f7ab552f569df63bdb59f751d0dd36e826682739/src/thrift/parquet.thrift#L338 Index pages are declared in Parquet format, but not actually implemented. See code above.
... View more
04-21-2016
12:01 AM
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions ) Example: When the table is pre-splitted with 6 regions: hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1 ... .... Job Counters Launched map tasks=1 Launched reduce tasks=6 <<<
... View more
01-25-2016
10:06 AM
According to the Sqoop documentation, it uses the generic Hadoop parameter passing scheme, which uses a space after the -D parameter denominator. You can check this at the sqoop site here:
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_using_options_files_to_pass_arguments
... View more
01-13-2016
02:10 PM
so, i've started to play with this and met interesting thing. When I try to proceed data with lzma i read in two times more data then i'm actually have on the HDFS. For example, hadoop client (hadoop fs -du) shows some numbers like 100GB. then i run MR (like select count(1) ) over this data and check MR counters and find "HDFS bytes read" two times more (like 200GB). In case of gzip and bzip2 codecs hadoop client file size and MR counters are the similar
... View more
11-25-2015
11:18 AM
When you ingest the data from an edge node that is also running datanode role, the 1st copy will always be written to that DN and it will use space much faster than any other datanode. To re-distribute space usage among all datanodes, you must run hdfs balancer.
... View more
11-10-2015
03:59 PM
1 Kudo
When Sqoop2 starts, it copies /usr/share/java/oracle-connector-java.jar to /var/lib/sqoop2/. Please rename ojdbc6.jar to oracle-connector-java.jar, change the owner and group to sqoop2 and restart Sqoop2.
Also, please use oracle.jdbc.OracleDriver as the JDBC Driver Class.
... View more
11-08-2015
11:51 PM
If by 'hard to analyse' you mean to parse/process it, you can consider using the Java API to fetch block location info too: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)
... View more