Reply
Explorer
Posts: 6
Registered: ‎08-26-2013
Accepted Solution

how to add source filename to cloudera search result?

Hello!

 

I have a files with data, for example web servers logs.

/data/log/000001.txt

/data/log/000002.txt

/data/log/000003.txt

/data/log/000004.txt

 

I want to build full text search on them and get filename in the search result.

How I can do this?

Explorer
Posts: 6
Registered: ‎08-26-2013

Re: how to add source filename to cloudera search result?

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

Highlighted
Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: how to add source filename to cloudera search result?