This may be a noob questions, so please forgive, but I am having challenges finding this in the documentation. I am aware that HDP can work with S3 buckets on AWS. However, we have data that cannot leave the data center, and Hadoop is the right tool to solve the problem. Can HDP work with data on a local SAN if that data is presented through an S3 interface?
The simple answer is to copy the data into HDFS from the local storage (load local, hadoop fs -put, etc.) and process there. This would be the recommended solution.
"Could" HDFS read SAN files through an S3 interface (Openstack Swift or maybe Fake S3 server)? Probably. The difficulties would probably arise in authentication. Not sure how these servers deal with access keys and secrets.
If your SAN supports the AWS authentication mechanisms then yes, you can use it. I'll call out the Western Digital store as one I know works: they've been very busy in the open source side of things. For other stores, tuning the authentication options is the usual troublespot
Start by pointing the clients at your local store by setting fs.s3a.endpoint to the hostname of the service. Probably also set fs.s3a.path.style.access to true, unless your system creates a DNS entry for every bucket. After that, it's down to playing with authentication. The propery fs.s3a.signing-algorithm is passed straight down to the AWS SDK here; a quick glance at its implementation implies it can be one of: NoOpSignerType, AWS4UnsignedPayloadSignerType, AWS3SignerType, AWS4SignerType and QueryStringSignerType. The v4 signing API is new and unlikely to work; the S3A default is the v3 one