Reply
Highlighted
Champion Alumni
Posts: 161
Registered: ‎02-11-2014
Accepted Solution

Extracthbase cell command does not retain xml tags

Hi,

 

I am inserting an xml  into  hbase column familiy and indexing it to solr.One of the solr fields is the  complete xml and other fields are the vvalues extracted from xml.How ever I am missing the xml tags in the indexed value.

 

 

I am taking the value out as a string.While writing into hbase  I  set character encoding as utf-8 and  also do the same on my  java code.I  have to display actualMessage field  as solr result(its one of the fields),It is getting displayed  but with out xml tags or attribute values.Can you help?.

 

{
extractHBaseCells {
mappings : [
{
inputColumn : "messages:*"
outputField : "actualMessage"
type : string
source : value
}
]
}
}

 

java {
imports : "import java.io.*;import javax.xml.parsers.*;import org.w3c.dom.*;"
code: """
String s =null;
byte [] b =null;
DocumentBuilderFactory docFactory = null;
DocumentBuilder docBuilder = null;
Document document = null;
InputStream is =null;
try{
s = (String)record.get("actualMessage").get(0);
b = s.getBytes("UTF-8");

 

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Extracthbase cell command does not retain xml tags

If indeed the data in HBase contains the XML tags, then it sounds like your tokenizer/analyzer chain in Solr schema.xml is stripping info away, i.e. schema.xml isn?t configured to do what you want it to do.

You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline:

logInfo { format : "my record: {}", args : ["@{}"] }

Also see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

Wolfgang.

Champion Alumni
Posts: 161
Registered: ‎02-11-2014

Re: Extracthbase cell command does not retain xml tags

Thanks mate.It worked.Thanks a lot for all your help in this

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.