Member since
08-26-2013
6
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2001 | 08-11-2014 08:35 AM |
08-13-2014
10:58 AM
It's worked! Thanks a lot. )) P.S. Вообще из России. Приятно встретить соотечественника )
... View more
08-11-2014
09:13 AM
I have buld an index over the dataset with nonlatin symbols. fields from schema.xml: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="description" type="text_ru" indexed="true" stored="true" required="true" multiValued="false" /> <field name="date" type="date" indexed="true" stored="true" /> <field name="pc" type="string" indexed="true" stored="true"/> <field name="user" type="string" indexed="true" stored="true" /> <field name="file_path" type="string" indexed="true" stored="true" /> <field name="text" type="text_general" indexed="true" stored="true" required="true" multiValued="false" /> <field name="_version_" type="long" indexed="true" stored="true"/> morphline config: morphlines : [ { id : index-svmkd importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { readCSV { separator : ";" columns : [description, date, pc, user] ignoreFirstLine : true trim : true charset : UTF-8 }} { logError { format : "output record: {}, file {}" args : ["@{}", "@{_attachment_body}"] }} { convertTimestamp { field : date inputFormats : ["dd.MM.yyyy HH:mm:ss.SSS"] inputTimezone : Europe/Moscow outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Europe/Moscow }} { addValues { text : "@{description} @{date} @{pc} @{user}" }} { generateUUID { field : id type : nonSecure }} { sanitizeUnknownSolrFields { solrLocator : { collection : svmkd zkHost : "MYHOST:2181/solr" }}} { loadSolr { solrLocator : { collection : svmkd # Name of solr collection zkHost : "MYHOST:2181/solr" # ZooKeeper ensemble }}}]}] Index was builded successfull and I can search any latin keywords in my dataset through cloudera search. However when I tryind to find nonlatin keywords in cloudera search, hue shows me an error: { "message": "'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)", "traceback": [ ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/search/src/search/views.py", 134, "search", "response = SolrApi(SOLR_URL.get(), request.user).query(collection, query)"], ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/libs/libsolr/src/libsolr/api.py", 146, "query", "response = self._root.get('%(collection)s/select' % solr_query, params)"], ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 90, "get", "return self.invoke(\"GET\", relpath, params, headers=headers, allow_redirects=True)"], ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/resource.py", 73, "invoke", "urlencode=self._urlencode)"], ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 131, "execute", "url = self._make_url(path, params)"], ["/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/lib/rest/http_client.py", 161, "_make_url", "param_str = urllib.urlencode(params)"], ["/usr/lib64/python2.6/urllib.py", 1281, "urlencode", "v = quote_plus(str(v))"] ], "detail": "None", "title": "Error while accessing Solr" } Similar queries from Solr Admin UI (MYHOST:8983) are processed well. Please help me to solve this porblem.
... View more
08-11-2014
08:35 AM
1 Kudo
I found the solution. When morphline process data from hdfs it appends additional fields for every record: file_download_url=[hdfs://MYHOST:2080/testdata/log], file_group=[nobody], file_host=[MYHOST], file_last_modified=[1405102390179], file_length=[198923], file_name=[log.txt], file_owner=[pmezentsev], file_path=[/testdata/log/log.txt], file_permissions_group=[r--], file_permissions_other=[r--], file_permissions_stickybit=[false], file_permissions_user=[rw-], file_port=[8020], file_scheme=[hdfs], file_upload_url=[hdfs://MYHOST/testdata/log/log.txt], so if you want to get full filename into your index, just file_path to your schema.xml <field name="file_path" type="string" indexed="true" stored="true" />
... View more
08-10-2014
01:48 AM
Hello! I have a files with data, for example web servers logs. /data/log/000001.txt /data/log/000002.txt /data/log/000003.txt /data/log/000004.txt I want to build full text search on them and get filename in the search result. How I can do this?
... View more
Labels:
- Labels:
-
Cloudera Search