Member since
06-02-2016
15
Posts
12
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3087 | 01-10-2017 09:10 PM | |
3300 | 10-04-2016 10:48 PM | |
8177 | 07-13-2016 04:41 AM |
09-12-2017
01:32 AM
@Vijay Parmar, I'd suggest a few tests of concatenation vs the temporary table solution suggested by your DBAs vs whatever else you come up with. Once you get a feel for how the processing works you'll arrive at the solution that best works for you.
... View more
09-10-2017
11:26 PM
@Vijay Parmar, you can concatenate Hive tables to merge small files together. This can happen while the table is active. The syntax is: ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;
See the Hive documentation for details.
... View more
01-10-2017
09:26 PM
1 Kudo
@Neeraj Sabharwal, your ALTER TABLE statement should work. One question: Do you place the data for partition in a subdirectory with the partition name? Hive partitions exist as subdirectories. For example, your user table should have a structure similar to this: /external_table_path/date=2010-02-22
/external_table_path/date=2010-02-23
/external_table_path/date=2010-02-24 And so on. The ALTER TABLE statement will create the directories as well as adding the partition details to the Hive metastore. Once the partitions are created you can simply drop the right file/s in the right directory. Cheers,
Steven.
... View more
01-10-2017
09:10 PM
1 Kudo
@sagar pavan, instead of using: --target-dir '/user/tsldp/patelco/' try: --warehouse-dir '/user/tsldp/patelco/' Each table will then be in a subdirectory under '/user/tsldp/patelco/' Cheers,
Steven.
... View more
10-04-2016
10:48 PM
3 Kudos
@Mahesh Mallikarjunappa The Hive serde can be used to load data into HBase via Hive. A simple example follows: -- Create a hive-managed HBase table
CREATE TABLE MyHBaseTable(MyKey string, Col1 string, Col2 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,colfam:col1,colfam:col2")
TBLPROPERTIES("hbase.table.name" = "MyNamespace:MyTable")
;
-- Insert data into it
INSERT INTO TABLE MyHBaseTable
SELECT SourceKey, SourceCol1, SourceCol2
FROM SourceHiveTable
; And from Pig, you can read form that same source and write to that same target using the HBase serde as follows: pig -useHCatalog -f script.pig Where script.pig is as below RawData = LOAD 'SourceHiveTable'
USING org.apache.hive.hcatalog.pig.HCatLoader();
KeepColumns = FOREACH RawData
GENERATE SourceKey, SourceCol1, SourceCol2;
STORE KeepColumns
INTO 'hbase://MyNamespace:MyTable'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam:col1,colfam:col2'); Note: You don't specify the key in the STORE statement - the first column is always the key. Hope this helps!
... View more
09-14-2016
11:04 PM
2 Kudos
@Arkaprova Saha I'm not sure about the --connection-manager option, but I have successfully performed a sqoop import from Teradata to AVRO using Teradata's JDBC driver as follows: sqoop import --driver com.teradata.jdbc.TeraDriver \
--connect 'jdbc:teradata://****/DATABASE=****' \
--username **** --password **** \
--table MyTable \
--target-dir /****/****/**** \
--as-avrodatafile \
--num-mappers 1 Just ensure that the JBDC driver, terajdbc4.jar, is in your $SQOOP_LIB folder. For me, on HDP 2.4 that is /usr/hdp/current/sqoop-client/lib
... View more
07-13-2016
04:41 AM
1 Kudo
@Emily Sharpe I believe your issue relates to the way Pig is processing the NULL Avro data. Rather than ignoring those NULL values, Pig passes the key and an empty value to HBase, which dutifully stores it. To avoid storing these values, filter them out. The following Pig code shows how to do this for a single key/value Avro source: ImageAvro = LOAD '/path/to/RawAvroData' USING org.apache.pig.piggybank.storage.avro.AvroStorage ('no_schema_check', 'schema_file', '/path/to/AvroSchemaFile.avsc');
filteredImage = FOREACH (FILTER ImageAvro BY SIZE(ImageColumn) > 0) GENERATE KeyColumn, ImageColumn;
STORE filteredImage INTO 'hbase://namespace:table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colFamily:col'); Similarly, you can identify the empty cells with the same FILTER operation. Here's how to save a list of keys that have an empty colFamily:col cell: ImageHBase = LOAD 'hbase://namespace:table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colFamily:col', '-loadKey true') as (KeyColumn:chararray, ImageColumn:bytearray);
NullImage = FOREACH (FILTER ImageHBase BY SIZE(ImageColumn) == 0) GENERATE KeyColumn;
STORE NullImage into '/path/to/flat/file' USING PigStorage;
... View more
06-09-2016
10:03 PM
I believe so, yes. The -Dimport.bulk.output can be performed on the target. This will prep the HBase files according to the target version/number of region servers/etc.
... View more
06-03-2016
12:43 AM
1 Kudo
Totally agree re bulk import. One additional point. You need to ensure the hbase user has access to read/write the files created by the -Dimport.bulk.output step. If it doesn't, the completebulkload step will appear to hang. The simplest way to achieve this is to do: hdfs dfs -chmod -R 777 <dir containing export files> as the owner of those files. completebulkload, running as hbase, simply moves these to the relevant HBase directories. With the permnissions correctly set, this takes fractions of a second.
... View more