- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Extracthbase cell command does not retain xml tags
- Labels:
-
Apache HBase
-
Apache Solr
Created on ‎05-01-2014 11:02 AM - edited ‎09-16-2022 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am inserting an xml into hbase column familiy and indexing it to solr.One of the solr fields is the complete xml and other fields are the vvalues extracted from xml.How ever I am missing the xml tags in the indexed value.
I am taking the value out as a string.While writing into hbase I set character encoding as utf-8 and also do the same on my java code.I have to display actualMessage field as solr result(its one of the fields),It is getting displayed but with out xml tags or attribute values.Can you help?.
{
extractHBaseCells {
mappings : [
{
inputColumn : "messages:*"
outputField : "actualMessage"
type : string
source : value
}
]
}
}
java {
imports : "import java.io.*;import javax.xml.parsers.*;import org.w3c.dom.*;"
code: """
String s =null;
byte [] b =null;
DocumentBuilderFactory docFactory = null;
DocumentBuilder docBuilder = null;
Document document = null;
InputStream is =null;
try{
s = (String)record.get("actualMessage").get(0);
b = s.getBytes("UTF-8");
Created ‎05-01-2014 01:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline:
logInfo { format : "my record: {}", args : ["@{}"] }
Also see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr
Wolfgang.
Created ‎05-01-2014 01:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could confirm that the morphline is doing what it?s supposed to do by adding some debug log message like this to your morphline:
logInfo { format : "my record: {}", args : ["@{}"] }
Also see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr
Wolfgang.
Created ‎05-05-2014 10:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks mate.It worked.Thanks a lot for all your help in this
