Support Questions

Find answers, ask questions, and share your expertise

error in searching of non-Latin symbols

avatar
Explorer

I have buld an index over the dataset with nonlatin symbols.

 

fields from schema.xml:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="description" type="text_ru" indexed="true" stored="true" required="true" multiValued="false" />
<field name="date" type="date" indexed="true" stored="true" />
<field name="pc" type="string" indexed="true" stored="true"/>
<field name="user" type="string" indexed="true" stored="true" />
<field name="file_path" type="string" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />

<field name="_version_" type="long" indexed="true" stored="true"/>

 

morphline config:

morphlines : [
{
  id : index-svmkd

  importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

  commands : [

  {
    readCSV {
      separator : ";"
      columns : [description, date, pc, user]
      ignoreFirstLine : true
      trim : true
      charset : UTF-8
  }}

  { logError {
      format : "output record: {}, file {}"
      args : ["@{}", "@{_attachment_body}"]
  }}

  {
    convertTimestamp {
    field : date
      inputFormats : ["dd.MM.yyyy HH:mm:ss.SSS"]
      inputTimezone : Europe/Moscow
      outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
      outputTimezone : Europe/Moscow
  }}
  {
    addValues {
      text : "@{description} @{date} @{pc} @{user}"
  }}

  {
    generateUUID {
      field : id
      type : nonSecure
  }}

  {
    sanitizeUnknownSolrFields {
    solrLocator : {
      collection : svmkd
      zkHost : "MYHOST:2181/solr"
  }}}

  {
    loadSolr {
      solrLocator : {
      collection : svmkd # Name of solr collection
      zkHost : "MYHOST:2181/solr" # ZooKeeper ensemble
}}}]}]



Index was builded successfull and I can search any latin keywords in my dataset through cloudera search.

However when I tryind to find nonlatin keywords in cloudera search, hue shows me an error:

 

{
  "message": "'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)",
  "traceback": [
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py", 134, "search", "response = SolrApi(SOLR_URL.get(), request.user).query(collection, query)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/libs/libsolr/src/libsolr/api.py", 146, "query", "response = self._root.get('%(collection)s/select' % solr_query, params)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 90, "get", "return self.invoke(\"GET\", relpath, params, headers=headers, allow_redirects=True)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 73, "invoke", "urlencode=self._urlencode)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 131, "execute", "url = self._make_url(path, params)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 161, "_make_url", "param_str = urllib.urlencode(params)"],
  ["/usr/lib64/python2.6/urllib.py", 1281, "urlencode", "v = quote_plus(str(v))"]
],
  "detail": "None",
  "title": "Error while accessing Solr"
}

 

 

Similar queries from Solr Admin UI (MYHOST:8983) are processed well.

Please help me to solve this porblem.

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi Pavel,

 

I had the same problem. 

 

You can try this:

 

1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py

2) Add this before LOG = logging.getLogger(__name__) :

          
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')

 

3) Restart Hue service.

 

Hope this helps.

 

Update:

Here is better solution: https://issues.cloudera.org/browse/HUE-2279

 

 

P.S. По аватарке думается, что вы из России. Так ли это?

 

Regrds,

Andrey

View solution in original post

3 REPLIES 3

avatar
Rising Star

Hi Pavel,

 

I had the same problem. 

 

You can try this:

 

1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py

2) Add this before LOG = logging.getLogger(__name__) :

          
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')

 

3) Restart Hue service.

 

Hope this helps.

 

Update:

Here is better solution: https://issues.cloudera.org/browse/HUE-2279

 

 

P.S. По аватарке думается, что вы из России. Так ли это?

 

Regrds,

Andrey

avatar
Explorer

It's worked!

Thanks a lot. ))

 

P.S. Вообще из России. Приятно встретить соотечественника )

avatar
Explorer

Hey,
I have a similar problem.

{"message": "'http://www.bgk.com.pl/storage/10035/og\\xc5\\x82oszenie%20o%20zam\\xc3\\xb3wieniu%20BZP%2045%20DLA%202015%20przes\\xc5\\x82ane%20do%20publikacji.pdf'", "traceback": [["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/views.py", 137, "search", "response = augment_solr_response(response, collection, query)"], ["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/models.py", 548, "augment_solr_response", "highlighting = response['highlighting'][str(doc[id_field])]"]], "detail": "None", "title": "Error while accessing Solr"}

Described above workaround is not working.
Thank for the help