Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to add source filename to cloudera search result?

Solved Go to solution
Highlighted

how to add source filename to cloudera search result?

Explorer

Hello!

 

I have a files with data, for example web servers logs.

/data/log/000001.txt

/data/log/000002.txt

/data/log/000003.txt

/data/log/000004.txt

 

I want to build full text search on them and get filename in the search result.

How I can do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: how to add source filename to cloudera search result?

Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

View solution in original post

2 REPLIES 2
Highlighted

Re: how to add source filename to cloudera search result?

Explorer

I found the solution.

When morphline process data from hdfs it appends additional fields for every record:

 

file_download_url=[hdfs://MYHOST:2080/testdata/log],
file_group=[nobody],
file_host=[MYHOST],
file_last_modified=[1405102390179],
file_length=[198923],
file_name=[log.txt],
file_owner=[pmezentsev],
file_path=[/testdata/log/log.txt],
file_permissions_group=[r--],
file_permissions_other=[r--],
file_permissions_stickybit=[false],
file_permissions_user=[rw-],
file_port=[8020],
file_scheme=[hdfs],
file_upload_url=[hdfs://MYHOST/testdata/log/log.txt],

 

so if you want to get full filename into your index, just file_path to your schema.xml

   <field name="file_path" type="string" indexed="true" stored="true" />

 

 

View solution in original post

Re: how to add source filename to cloudera search result?

Expert Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here