Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Overriding HDFS replication for Cloudera Search

Highlighted

Overriding HDFS replication for Cloudera Search

Champion Alumni

Hi,

 

I  want to enable a replication of 3 for hdfs.I am also using cloudera search  to index data from hbase on to solr cloud via lily.I have a single collection  and plan to  have 6 solr shards with replication factor of  3 in solr.Since the solr index is going to be stored on hdfs I assume there will be 9 copies of the same data.How can I  over come this problem.Is there some way to tell  hdfs not to replicate this data?.Considering the volume of data we have I do not want to  maintain so much of repated data.

 

 

 

Thanks,

Nishan

2 REPLIES 2
Highlighted

Re: Overriding HDFS replication for Cloudera Search

New Contributor

Hi,

 

I have exactly same question - how do Solr replication and HDFS replication work together (or not) when Solr indices are stored on HDFS? Appreciate any insights!

 

Regards,

 

Adrian

Re: Overriding HDFS replication for Cloudera Search

Contributor
HDFS replication is managed site wide, you can however change replication factor on individual files via the hdfs cli tool: http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.0/hadoop-project-dist/hadoop-common/FileS...

"hadoop fs -setrep -R 1 /solr" - would set replication factor to 1 for all files under /solr

new files howover will again adhere to the site wide setting. I guess you will need to run this regularly if your index updates and you don't want a site wide adjustment.
Don't have an account?
Coming from Hortonworks? Activate your account here