Created 03-20-2017 02:45 PM
I found this article from Hortonworks about importing tsv files , how can I apply the same for csv files?
Created 03-20-2017 03:04 PM
Note how there is a tab ("\t") character provided in step #5. Change this to a comma (",") character and you can read CSV files. You provide the column delimiter to match the data you want to ingest.
Created 03-20-2017 03:16 PM
I tried the following but I am getting errors :
the fields in the file below represent the following data id lowtemp hightemp vibration lowpressure highpressure [hbase@hadoop1 ~]$ more a.csv 5842,50,30,4,240,340 5843,52,32,5,250,360 5844,56,31,2,248,333 [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ more load_csv.ddl CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'temp:lowtemp,temp:hightemp,vibration,pressure:lowpressure,pressure:highpress ure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv'); [hbase@hadoop1 ~]$ [hbase@hadoop1 ~]$ hcat -f load_csv.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Error: the HBase columns mapping contains a badly formed column family, column qualifier specification.)
Created 03-20-2017 04:08 PM
You did not specify column family for vibration. You also omitted the key field.
Try something like this:
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv');
Created 03-20-2017 06:12 PM
I want to make columns as "TEMP" "VIBRATION" "PRESSSURE" , The TEMP and PRESSURE have two columns "high" and "low" , what will be the command syntax then ?
Created 03-20-2017 06:17 PM
also if I try your command I get this error :
[hbase@hadoop1 ~]$ hcat -f a.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 6 elements while hbase.columns.mapping has 7 elements (counting the key if implicit))
Created 03-20-2017 06:58 PM
but I am stuck even before this step. .. please see the error below
Created 03-20-2017 09:22 PM
ok I created the table in hive n hbase as follows ,
how do I run those "A = " and "STORE A INTO" commands. . as hive ? hbase? pig?
create table test2(id int, lowT string, highT string, vib int, lowP string,highP string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,temperature:lowT,temperature:highT,vibration:vib, pressure:lowP, pressure:highP') TBLPROPERTIES('hbase.table.name'='test1') ;
Created 03-20-2017 03:05 PM
You can specify the delimiter in the LOAD script; Let's assume, that in your source file, the delimiter is semicolon (';') instead of TAB. Then you will want to run this LOAD script:
CREATE TABLE load_csv (id STRING, lowtemp STRING, hightemp STRING, vibration STRING, lowpressure STRING, highpressure STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':id,cf:lowtemp,cf:hightemp,cf:vibration,cf:lowpressure,cf:highpressure' ) TBLPROPERTIES ( 'hbase.table.name' = 'load_csv'); A = LOAD 'hdfs:///tmp/data.tsv' USING PigStorage(',') AS (id:chararray, lowtemp:chararray, hightemp:chararray, vibration:chararray, lowpressure:chararray, highpressure:chararray); -- DUMP A; STORE A INTO 'load_csv' USING org.apache.hive.hcatalog.pig.HCatStorer();
More specifically, in the delimiter is the argument for the PigStorage
Created 03-20-2017 03:46 PM
if I try the example directly given by Hortonworks I get the error
[hbase@hadoop1 ~]$ more a.ddl CREATE TABLE simple_hcat_load_table (id STRING, c1 STRING, c2 STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = 'd:c1,d:c2' ) TBLPROPERTIES ( 'hbase.table.name' = 'simple_hcat_load_table' ); [hbase@hadoop1 ~]$ hcat -f a.ddl FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)