Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Contributor

We want to write data to Azure BlobStorage-account. We are using the lastest CDH5: 5.9.0-1.cdh5.9.0.p0.23.

 

You have two options:

  1. Use the Account-Key (root key of storage account)
  2. A SAS-token for: limited amount of time, limited privilages, limit IP-range.

You can add key-value to core-site.xml to add access to an Azure Storage account.

For option 1 use:

  • key: fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net
  • val: TheAccountKeyEndingOn==

For option 2 use:

  • key: fs.azure.sas.CONTAINER.ACCOUNTNAME.blob.core.windows.net
  • val: GeneratedSasToken
  • Example: sr=c&sp=rwdl&sig=YyX%2BL/TpX5sdadASD7fiipuD9iVM31F0Pjwup7tA%3D&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z

To be clear option 1 works fine, the problem is about option 2!

When we create a SAS-token with all accessrights we are not able upload a file.

 

 

# works: list
hdfs dfs -ls wasbs://mycontainer@myaccount.blob.core.windows.net/

# works: copy to hdfs
hdfs dfs -cp wasbs://mycontainer@myaccount.blob.core.windows.net/dummyfile.txt /tmp/ 

# fails: put to BlobStorage
hdfs dfs -put localfile.txt wasbs://mycontainer@myaccount.blob.core.windows.net/
# error:
put: com.microsoft.windowsazure.storage.StorageException: The specified resource does not exist
# If you look in Azure Portal inside the BlobStore container there is a folder created: # _$azuretmpfolder$ # with the file I wanted to copy: # '13fe7e79-36b4-47b5-85d9-20f1f316e280localfile.txt._COPYING_'

  

How to solve this problem? 

For the time we configure the Account-key (option 1) to access StorageAccount to work around the problem.

 

Update what happens under the hood with TCPDump, these are the highlights: 

 

# >> = HTTP request
# << = HTTP response
# somtimes I put some headers below the request

dfs dfs -put test.txt wasb://CONTAINER@MYSTORAGEACCOUNT.blob.core.windows.net/test5

# does the file exists? >> HEAD /xml/test5.txt?SAS-token << HTTP/1.1 404 >> GET /xml?comp=list&sp=rwdl&sr=c&prefix=test5.txt%2&SAS-token << HTTP/1.1 200 OK
# does the copying file exists? >> HEAD /xml/test5.txt._COPYING_?SAStoken << HTTP/1.1 404
# send the content. >> PUT /xml/test5.txt._COPYING_?comp=blocklist&SAStoken x-ms-client-request-id: 6738c38e-0a2c-4d9c-9c49-4f9657ff8eb0 x-ms-meta-hdi_permission: {"owner":"alexander","group":"supergroup","permissions":"rw-r--r--"} x-ms-meta-hdi_tmpupload: _%24azuretmpfolder%24%2Ffa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_ >> PUT /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?blockid=AAAAALjq9uI%3D&comp=block&SAS-token ( content is send here, with some XML )
# is the send file there? >> HEAD /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?SAS-token << HTTP/1.1 200 OK
# now move the file: >> PUT /xml/test5.txt._COPYING_?SAS-token x-ms-version: 2013-08-15 User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64) x-ms-client-request-id: 13c057d3-eebe-462a-869c-fb39429665dc x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_
# respone with body:
<< HTTP/1.1 404 <?xml version="1.0" encoding="utf-8"?> <Error><Code>CannotVerifyCopySource</Code> <Message>The specified resource does not exist. RequestId:da53e5d1-0001-0109-39e3-7049dc000000 Time:2017-01-17T17:01:38.5004017Z</Message></Error>

 

 

 

Our script to Generate a SAS-token, based on https://github.com/Azure-Samples/hdinsight-dotnet-python-azure-storage-shared-access-signature/blob/...

 

Spoiler
import time
import getpass
from azure.storage import AccessPolicy
from azure.storage.blob import BlockBlobService
from datetime import datetime, timedelta


def main():
    print("Going to generate a Container SAS-token.")
    conf = get_user_input()
    blob_service = get_blob_service(**conf)
    policies = get_policies(blob_service, conf["container_name"])
    if conf.get('policy_name') not in policies.keys():
        add_new_policy(blob_service, policies, **conf)
    generate_sas_token(blob_service, **conf)


def get_user_input():
    return {'account_name':     raw_input('StorageAccount Name: '),
            'account_key':      getpass.getpass('StorageAccount Key: '),
            'container_name':   raw_input('Container: '),
            'permissions':      raw_input('Permissions [rwdl] (default "rl"): ') or 'rl',
            'policy_name':      raw_input('Policy name (default "readonly"): ') or 'readonly',
            'expiry_days':      int(raw_input('Expiry days (default 365): ') or '365'),
            'ip_filter':        raw_input('IP filter (default None): ') or None}


def get_blob_service(account_name=None, account_key=None, container_name=None, **unused):
    blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
    if not blob_service.exists(container_name):
        raise IOError(
            "Container '%s' does not exist in StorageAccount '%s'!" % (container_name, account_name))
    else:
        print('can access the container in that storage account.')
    return blob_service


def add_new_policy(blob_service, policies, container_name=None, policy_name=None, expiry_days=None, permissions=None, **unused):
    expiry = datetime.utcnow() + timedelta(days=expiry_days)
    access_policy = AccessPolicy(permission=permissions, expiry=expiry)

    policies[policy_name] = access_policy
    print('adding new policy...')
    # Set the container to the updated list of identifiers (policies)
    blob_service.set_container_acl(container_name, signed_identifiers=policies)
    # Wait 3 seconds for acl to propagate
    time.sleep(3)
    print("new policy is added.")


def get_policies(blob_service, container_name):
    print('fetch current policies...')
    identifiers = blob_service.get_container_acl(container_name)
    for k, v in identifiers.items():
        print(" - '%s': permissions: %s, start: %s, expiry: %s" % (k, v.permission, v.start, v.expiry))
    return identifiers


def generate_sas_token(blob_service, container_name=None, policy_name=None, ip_filter=None, account_name=None, permissions=None, expiry=None, **unused):
    print("generating new sas token...")
    # Generate a new Shared Access Signature token using the policy (by name)
    sas_token = blob_service.generate_container_shared_access_signature(
        container_name, id=policy_name, ip=ip_filter, protocol='https')
    print('')
    print('Now you add/update in ClouderaManager -> HDFS -> config: core-site.xml')
    print('')
    print('=== key ===')
    print('fs.azure.sas.%s.%s.blob.core.windows.net' % (container_name, account_name))
    print('=== value ===')
    print(sas_token)
    print('=== description ===')
    print('Token with permissions: "%s", expires "%s"' % (permissions, expiry.date()))
    print('')
    print('Now, restart HDFS and test with command:')
    print('hdfs dfs -ls wasbs://%s@%s.blob.core.windows.net/' % (container_name, account_name))


if __name__ == "__main__":
    main()

 

6 REPLIES 6

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Contributor

As I mentioned before the command works with Account Key, not using the SAS-token.

I did a TCPDump of both situations the part where the SAS-token fails after the PUT to move the file:

 

# ===== Account Key ======
PUT /CONTAINER/test6.txt._COPYING_?timeout=90 HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: c809fa4d-1f75-4acd-a2d5-9ddbb33d15b6
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/0495c6ef-5529-42cb-ae51-aa479c609493test6.txt._COPYING_
x-ms-date: Tue, 17 Jan 2017 17:42:49 GMT
Authorization: SharedKey MYSTORAGEACCOUNT:/8hrG9WRAjAAAlASkaQPHx3hDZF535lqnsSH18asD5M=
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0

HTTP/1.1 202 Accepted
Transfer-Encoding: chunked
Last-Modified: Tue, 17 Jan 2017 17:42:49 GMT
ETag: "0x8D43F00414594DE"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4f36d344-0001-0059-68e9-701081000000
x-ms-version: 2013-08-15
x-ms-copy-id: dcdf5297-4570-43ff-920e-3bb1e3f0ce01
x-ms-copy-status: success
Date: Tue, 17 Jan 2017 17:42:48 GMT

# ========= SAS Token ============
PUT /CONTAINER/test7.txt._COPYING_?sp=rwdl&sr=c&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z&timeout=90&sig=YyX%2BL%2FTpXAAAGAi0vqfiipuD9iVM31F0Pjwup7tA%3D HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: dc55f745-482f-426e-96f2-c906d90ffb46
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/47c25af4-b68f-4239-b573-c51796fb2335test7.txt._COPYING_
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0


HTTP/1.1 404 The specified resource does not exist.

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Contributor

After some time I found a possible problem.

The header x-ms-copy-source refers to the original Blob to copy. 

When you suffix the file WITH the SAS-token the PUT-request works...

 

Going to rest now... and sleep on it...

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Contributor

This issue is nested in the azure-jars shipped with the distribution.

Cloudera is using a very old azure-storage jar version 0.6.0.

 

This issue not present is Hortonworks  (and Azure HDInsight) because they use an up-to-date version.

 

I tried to replace the jars with updated jars.

But they are linked to hdfs-2.7.x and Cloudera is using hdfs-2.6.x, so it did not work in the end.

 

Is there an update of azure-storage jars on the roadmap of Cloudera?

 

 

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

New Contributor

Can you please give example of how it worked -->
When you suffix the file WITH the SAS-token the PUT-request works...

Also you mentioned with updated libraries it worked. Can you please tell which versions worked for you?

Highlighted

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Contributor

It was a while ago.

 

Cloudera works fine with AccountKey. (read + write)

Cloudera cannot write using an SAS-token.

Reading from a blob with SAS-token works fine.

 

We tested the same situation with HDInsight (Hortonworks)

Add the storage account with SAS token. This works (read, write)

 

I could not update/replace the Azure-jars, because of breaking changes in the API.

(Cloudera Hadoop 2.6.0 vs Hortonworks Hadoop 2.7.x)

 

To answer you question. I recorded the network stream and found the problem.

I replayed the PUT request with a low-level network tool with SAS-token as addition behind the x-copy-source header. Then the request was successful. The problem is the code generating this request Azure-Storage inside the old jar still 0.6.0 working with Hadoop 2.6.x.

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Explorer

Hello,

 

Option 1 works fine? you has changed only the core-site.xml with the account/key or has configured other things ?

In cdh 10.2 appear an error ...

The value for one of the HTTP headers is not in the correct format

thanks

Don't have an account?
Coming from Hortonworks? Activate your account here