When you index file from HDFS into Solr, those files are stored into local FS like /use/local/de??? or indexing directly from HDFS, i mean if we will double copy of files HDFS and LOCAL FYLESYSTEM
If you configured Solr to use HDFS, it will not write to the local FS. Since it's not on the local FS, you will not get the advantage of OS file caching. Therefore you need to configure Solr to use off-heap cache instead of the OS using cache. If you are doing frequent updates, HDFS may not be the best solution for your Solr files because files change frequently and you need to do a lot of file IO. You can find details here:
That should work ok then. If you issue a commit with openSearcher=true only after adding all of your documents you'll have best results. Normally you want Solr to do commits (via the solrconfig.xml) rather than your client. Also you want to do it as infrequently as your use case can tolerate for best performance since there is overhead in creating new searchers and warming the caches. And, if using HDFS, you'll have to pull data across the network to your Solr nodes. Good luck. Please update me on how it's working out on HDFS.