Created on 08-11-2014 09:13 AM - edited 09-16-2022 02:04 AM
I have buld an index over the dataset with nonlatin symbols.
fields from schema.xml:
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="description" type="text_ru" indexed="true" stored="true" required="true" multiValued="false" />
<field name="date" type="date" indexed="true" stored="true" />
<field name="pc" type="string" indexed="true" stored="true"/>
<field name="user" type="string" indexed="true" stored="true" />
<field name="file_path" type="string" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
morphline config:
morphlines : [
{
id : index-svmkd
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands : [
{
readCSV {
separator : ";"
columns : [description, date, pc, user]
ignoreFirstLine : true
trim : true
charset : UTF-8
}}
{ logError {
format : "output record: {}, file {}"
args : ["@{}", "@{_attachment_body}"]
}}
{
convertTimestamp {
field : date
inputFormats : ["dd.MM.yyyy HH:mm:ss.SSS"]
inputTimezone : Europe/Moscow
outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
outputTimezone : Europe/Moscow
}}
{
addValues {
text : "@{description} @{date} @{pc} @{user}"
}}
{
generateUUID {
field : id
type : nonSecure
}}
{
sanitizeUnknownSolrFields {
solrLocator : {
collection : svmkd
zkHost : "MYHOST:2181/solr"
}}}
{
loadSolr {
solrLocator : {
collection : svmkd # Name of solr collection
zkHost : "MYHOST:2181/solr" # ZooKeeper ensemble
}}}]}]
Index was builded successfull and I can search any latin keywords in my dataset through cloudera search.
However when I tryind to find nonlatin keywords in cloudera search, hue shows me an error:
{
"message": "'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)",
"traceback": [
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py", 134, "search", "response = SolrApi(SOLR_URL.get(), request.user).query(collection, query)"],
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/libs/libsolr/src/libsolr/api.py", 146, "query", "response = self._root.get('%(collection)s/select' % solr_query, params)"],
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 90, "get", "return self.invoke(\"GET\", relpath, params, headers=headers, allow_redirects=True)"],
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 73, "invoke", "urlencode=self._urlencode)"],
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 131, "execute", "url = self._make_url(path, params)"],
["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 161, "_make_url", "param_str = urllib.urlencode(params)"],
["/usr/lib64/python2.6/urllib.py", 1281, "urlencode", "v = quote_plus(str(v))"]
],
"detail": "None",
"title": "Error while accessing Solr"
}
Similar queries from Solr Admin UI (MYHOST:8983) are processed well.
Please help me to solve this porblem.
Created on 08-13-2014 03:46 AM - edited 08-20-2014 03:22 AM
Hi Pavel,
I had the same problem.
You can try this:
1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py
2) Add this before LOG = logging.getLogger(__name__) :
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')
3) Restart Hue service.
Hope this helps.
Update:
Here is better solution: https://issues.cloudera.org/browse/HUE-2279
P.S. По аватарке думается, что вы из России. Так ли это?
Regrds,
Andrey
Created on 08-13-2014 03:46 AM - edited 08-20-2014 03:22 AM
Hi Pavel,
I had the same problem.
You can try this:
1) Open /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py
2) Add this before LOG = logging.getLogger(__name__) :
import sys
reload(sys) # Reload does the trick!
sys.setdefaultencoding('UTF8')
3) Restart Hue service.
Hope this helps.
Update:
Here is better solution: https://issues.cloudera.org/browse/HUE-2279
P.S. По аватарке думается, что вы из России. Так ли это?
Regrds,
Andrey
Created 08-13-2014 10:58 AM
It's worked!
Thanks a lot. ))
P.S. Вообще из России. Приятно встретить соотечественника )
Created 07-16-2015 11:03 PM
Hey,
I have a similar problem.
{"message": "'http://www.bgk.com.pl/storage/10035/og\\xc5\\x82oszenie%20o%20zam\\xc3\\xb3wieniu%20BZP%2045%20DLA%202015%20przes\\xc5\\x82ane%20do%20publikacji.pdf'", "traceback": [["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/views.py", 137, "search", "response = augment_solr_response(response, collection, query)"], ["/opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/lib/hue/apps/search/src/search/models.py", 548, "augment_solr_response", "highlighting = response['highlighting'][str(doc[id_field])]"]], "detail": "None", "title": "Error while accessing Solr"}
Described above workaround is not working.
Thank for the help