Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

how to add source filename to cloudera search result?

avatar
Explorer

Hello!

 

I have a files with data, for example web servers logs.

/data/log/000001.txt

/data/log/000002.txt

/data/log/000003.txt

/data/log/000004.txt

 

I want to build full text search on them and get filename in the search result.

How I can do this?

1 ACCEPTED SOLUTION

avatar
Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

View solution in original post

2 REPLIES 2

avatar
Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

avatar
Super Collaborator