Created 03-20-2017 02:45 PM
I found this article from Hortonworks about importing tsv files , how can I apply the same for csv files?
Created 03-20-2017 03:04 PM
Note how there is a tab ("\t") character provided in step #5. Change this to a comma (",") character and you can read CSV files. You provide the column delimiter to match the data you want to ingest.
Created 03-20-2017 03:16 PM
I tried the following but I am getting errors :
the fields in the file below represent the following data id lowtemp hightemp vibration lowpressure highpressure [hbase@hadoop1 ~]$ more a.csv 5842,50,30,4,240,340 5843,52,32,5,250,360 5844,56,31,2,248,333 [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ more load_csv.ddl CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'temp:lowtemp,temp:hightemp,vibration,pressure:lowpressure,pressure:highpress ure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv'); [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ hcat -f load_csv.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed column family, column qualifier specification.)
Created 03-20-2017 04:08 PM
You did not specify column family for vibration. You also omitted the key field.
Try something like this:
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');
Created 03-20-2017 06:12 PM
I want to make columns as "TEMP" "VIBRATION" "PRESSSURE" , The TEMP and PRESSURE have two columns "high" and "low" , what will be the command syntax then ?
Created 03-20-2017 06:17 PM
also if I try your command I get this error :
[hbase@hadoop1 ~]$ hcat -f a.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))
Created 03-20-2017 06:58 PM
but I am stuck even before this step. .. please see the error below
Created 03-20-2017 09:22 PM
ok I created the table in hive n hbase as follows ,
how do I run those "A = " and "STORE A INTO" commands. . as hive ? hbase? pig?
create table test2(id int, lowT string, highT string, vib int, lowP string,highP string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,temperature:lowT,temperature:highT,vibration:vib, pressure:lowP, pressure:highP') TBLPROPERTIES('hbase.table.name'='test1') ;
Created 03-20-2017 03:05 PM
You can specify the delimiter in the LOAD script; Let's assume, that in your source file, the delimiter is semicolon (';') instead of TAB. Then you will want to run this LOAD script:
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv'); A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage(',') AS (id:chararray, lowtemp:chararray, hightemp:chararray, vibration:chararray, lowpressure:chararray, highpressure:chararray); -- DUMP A; STORE A INTO 'load_csv' USING org.apache.hive.hcatalog.pig.HCatStorer();
More specifically, in the delimiter is the argument for the PigStorage
Created 03-20-2017 03:46 PM
if I try the example directly given by Hortonworks I get the error
[hbase@hadoop1 ~]$ more a.ddl CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'd:c1,d:c2' ) TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table' ); [hbase@hadoop1 ~]$ hcat -f a.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
Created 03-20-2017 03:49 PM
Please include the full exception. I would guess that your classpath is wrong, causing this to not find your HBase instance.
Created 03-20-2017 07:07 PM
hi pbarna can you try your create table command and see if it works for you? its not working for me.
thanks
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit)) [hbase@hadoop1 ~]$
Created 03-20-2017 07:23 PM
Swap ":id" with ":key" in the hbase.columns.mapping. Just a simple typo. See https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration for documentation on configuring the HBaseStorageHandler.
Created 03-21-2017 02:32 PM
I solved this problem using the following method , but I do want to know why would one want to use the SERDE method and not this one?
[hbase@hadoop1 ~]$ more a.csv 5842,50,30,4,240,340 5843,52,32,5,250,360 5844,56,31,2,248,333 [hbase@hadoop1 ~]$ create table test3(Id int, lowT string, highT string,vib int, lowP string,highP string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED AS TEXTFILE TBLPROPERTIES("skip.header.line.count"="1"); load data inpath '/user/hbase/a.csv' OVERWRITE INTO TABLE test3; Loading data to table default.test3 Table default.test3 stats: [numFiles=1, numRows=0, totalSize=63, rawDataSize=0] OK Time taken: 0.668 seconds