Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

error in searching of non-Latin symbols

avatar
Explorer

I have buld an index over the dataset with nonlatin symbols.

 

fields from schema.xml:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="description" type="text_ru" indexed="true" stored="true" required="true" multiValued="false" />
<field name="date" type="date" indexed="true" stored="true" />
<field name="pc" type="string" indexed="true" stored="true"/>
<field name="user" type="string" indexed="true" stored="true" />
<field name="file_path" type="string" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />

<field name="_version_" type="long" indexed="true" stored="true"/>

 

morphline config:

morphlines : [
{
  id : index-svmkd

  importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

  commands : [

  {
    readCSV {
      separator : ";"
      columns : [description, date, pc, user]
      ignoreFirstLine : true
      trim : true
      charset : UTF-8
  }}

  { logError {
      format : "output record: {}, file {}"
      args : ["@{}", "@{_attachment_body}"]
  }}

  {
    convertTimestamp {
    field : date
      inputFormats : ["dd.MM.yyyy HH:mm:ss.SSS"]
      inputTimezone : Europe/Moscow
      outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
      outputTimezone : Europe/Moscow
  }}
  {
    addValues {
      text : "@{description} @{date} @{pc} @{user}"
  }}

  {
    generateUUID {
      field : id
      type : nonSecure
  }}

  {
    sanitizeUnknownSolrFields {
    solrLocator : {
      collection : svmkd
      zkHost : "MYHOST:2181/solr"
  }}}

  {
    loadSolr {
      solrLocator : {
      collection : svmkd # Name of solr collection
      zkHost : "MYHOST:2181/solr" # ZooKeeper ensemble
}}}]}]



Index was builded successfull and I can search any latin keywords in my dataset through cloudera search.

However when I tryind to find nonlatin keywords in cloudera search, hue shows me an error:

 

{
  "message": "'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)",
  "traceback": [
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py", 134, "search", "response = SolrApi(SOLR_URL.get(), request.user).query(collection, query)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/libs/libsolr/src/libsolr/api.py", 146, "query", "response = self._root.get('%(collection)s/select' % solr_query, params)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 90, "get", "return self.invoke(\"GET\", relpath, params, headers=headers, allow_redirects=True)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 73, "invoke", "urlencode=self._urlencode)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 131, "execute", "url = self._make_url(path, params)"],
  ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 161, "_make_url", "param_str = urllib.urlencode(params)"],
  ["/usr/lib64/python2.6/urllib.py", 1281, "urlencode", "v = quote_plus(str(v))"]
],
  "detail": "None",
  "title": "Error while accessing Solr"
}

 

 

Similar queries from Solr Admin UI (MYHOST:8983) are processed well.

Please help me to solve this porblem.

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi Pavel,

 

I had the same problem. 

 

You can try this:

 

1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py

2) Add this before LOG = logging.getLogger(__name__) :

          
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')

 

3) Restart Hue service.

 

Hope this helps.

 

Update:

Here is better solution: https://issues.cloudera.org/browse/HUE-2279

 

 

P.S. По аватарке думается, что вы из России. Так ли это?

 

Regrds,

Andrey

View solution in original post

3 REPLIES 3

avatar
Rising Star

Hi Pavel,

 

I had the same problem. 

 

You can try this:

 

1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py

2) Add this before LOG = logging.getLogger(__name__) :

          
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')

 

3) Restart Hue service.

 

Hope this helps.

 

Update:

Here is better solution: https://issues.cloudera.org/browse/HUE-2279

 

 

P.S. По аватарке думается, что вы из России. Так ли это?

 

Regrds,

Andrey

avatar
Explorer

It's worked!

Thanks a lot. ))

 

P.S. Вообще из России. Приятно встретить соотечественника )

avatar
Explorer

Hey,
I have a similar problem.

{"message": "'http://www.bgk.com.pl/storage/10035/og\\xc5\\x82oszenie%20o%20zam\\xc3\\xb3wieniu%20BZP%2045%20DLA%202015%20przes\\xc5\\x82ane%20do%20publikacji.pdf'", "traceback": [["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/views.py", 137, "search", "response = augment_solr_response(response, collection, query)"], ["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/models.py", 548, "augment_solr_response", "highlighting = response['highlighting'][str(doc[id_field])]"]], "detail": "None", "title": "Error while accessing Solr"}

Described above workaround is not working.
Thank for the help