Reply
New Contributor
Posts: 1
Registered: ‎03-08-2017

Using MultiDelimitSerDe

In hbase I have a table called 'Data'. A snippet of the data looks like:

 

ROW     COLUMN+CELL
001        column=R2:data, timestamp=1484695502280, value=key1$^valueA^^key2$^valueB^^key3$^valueC
002        column=R2:data, timestamp=1484695503481, value=key1$^valueX^^key2$^valueY^^key3$^valueZ


We use unique character delimeters for the field and map keys due to spaces, commas, and tabs being present in the actual data.

 

In Hive I would like to create an external table against the hbase 'Data' table. And then create a view on the external table to display the key/value information.

 

I've been doing some research and believe that MultiDelimitSerDe may be something that I may be able to use. However, I am running into issues.

 

My external table is defined as:

 

CREATE EXTERNAL TABLE Test(key String, sessionInfo map<String, String>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,R2:data",
"field.delim"="^^",
"mapkey.delim"="$")
TBLPROPERTIES ('hbase.table.name' = 'Data');

 

However, when I perform the following query:
select * from Test;

 

I get the following error:

 

Error: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe: expects either BytesWritable or Text object! (state=,code=0)

 

I am a little confused to what the error is referring to, so any help would be appreciated.

 

Thank you.

Cloudera Employee
Posts: 3
Registered: ‎11-15-2016

Re: Using MultiDelimitSerDe

Hello Noel,

 

According to https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe, your table creation syntax appears valid and field.delim="^^" seems acceptable.

 

A quick search on the Exception thrown brings us to https://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/M..., which contains the following code block:

 

@Override
  public Object deserialize(Writable blob) throws SerDeException {
    if (byteArrayRef == null) {
      byteArrayRef = new ByteArrayRef();
    }

    // we use the default field delimiter('\1') to replace the multiple-char field delimiter
    // but we cannot use it to parse the row since column data can contain '\1' as well
    String rowStr;
    if (blob instanceof BytesWritable) {
      BytesWritable b = (BytesWritable) blob;
      rowStr = new String(b.getBytes());
    } else if (blob instanceof Text) {
      Text rowText = (Text) blob;
      rowStr = rowText.toString();
    } else {
      throw new SerDeException(getClass() + ": expects either BytesWritable or Text object!");
    }
    byteArrayRef.setData(rowStr.replaceAll(Pattern.quote(fieldDelimited), "\1").getBytes());
    cachedLazyStruct.init(byteArrayRef, 0, byteArrayRef.getData().length);
    // use the multi-char delimiter to parse the lazy struct
    cachedLazyStruct.parseMultiDelimit(rowStr.getBytes(), fieldDelimited.getBytes());
    return cachedLazyStruct;
  }

 It appears that the blob being passed into this function is not detected as either BytesWriteable or Text, so it's hitting the final else clause and throwing the SerDeException. A closer inspection of what type the blob is may help isolate the root cause.

 

You may be able to narrow down which data file was being parsed when that exception was hit by looking at the task logs. Assuming there's more than one data file, you could try moving the one in question out of the table directory.

 

Regards,

Nickolaus

 

Announcements