Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to add source filename to cloudera search result?

Solved Go to solution

how to add source filename to cloudera search result?

Explorer

Hello!

 

I have a files with data, for example web servers logs.

/data/log/000001.txt

/data/log/000002.txt

/data/log/000003.txt

/data/log/000004.txt

 

I want to build full text search on them and get filename in the search result.

How I can do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to add source filename to cloudera search result?

Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

2 REPLIES 2
Highlighted

Re: how to add source filename to cloudera search result?

Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

Re: how to add source filename to cloudera search result?

Expert Contributor