Reply
Explorer
Posts: 22
Registered: ‎02-17-2015

SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

[ Edited ]

We want to write data to Azure BlobStorage-account. We are using the lastest CDH5: 5.9.0-1.cdh5.9.0.p0.23.

 

You have two options:

  1. Use the Account-Key (root key of storage account)
  2. A SAS-token for: limited amount of time, limited privilages, limit IP-range.

You can add key-value to core-site.xml to add access to an Azure Storage account.

For option 1 use:

  • key: fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net
  • val: TheAccountKeyEndingOn==

For option 2 use:

  • key: fs.azure.sas.CONTAINER.ACCOUNTNAME.blob.core.windows.net
  • val: GeneratedSasToken
  • Example: sr=c&sp=rwdl&sig=YyX%2BL/TpX5sdadASD7fiipuD9iVM31F0Pjwup7tA%3D&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z

To be clear option 1 works fine, the problem is about option 2!

When we create a SAS-token with all accessrights we are not able upload a file.

 

 

# works: list
hdfs dfs -ls wasbs://mycontainer@myaccount.blob.core.windows.net/

# works: copy to hdfs
hdfs dfs -cp wasbs://mycontainer@myaccount.blob.core.windows.net/dummyfile.txt /tmp/ 

# fails: put to BlobStorage
hdfs dfs -put localfile.txt wasbs://mycontainer@myaccount.blob.core.windows.net/
# error:
put: com.microsoft.windowsazure.storage.StorageException: The specified resource does not exist
# If you look in Azure Portal inside the BlobStore container there is a folder created: # _$azuretmpfolder$ # with the file I wanted to copy: # '13fe7e79-36b4-47b5-85d9-20f1f316e280localfile.txt._COPYING_'

  

How to solve this problem? 

For the time we configure the Account-key (option 1) to access StorageAccount to work around the problem.

 

Update what happens under the hood with TCPDump, these are the highlights: 

 

# >> = HTTP request
# << = HTTP response
# somtimes I put some headers below the request

dfs dfs -put test.txt wasb://CONTAINER@MYSTORAGEACCOUNT.blob.core.windows.net/test5

# does the file exists? >> HEAD /xml/test5.txt?SAS-token << HTTP/1.1 404 >> GET /xml?comp=list&sp=rwdl&sr=c&prefix=test5.txt%2&SAS-token << HTTP/1.1 200 OK
# does the copying file exists? >> HEAD /xml/test5.txt._COPYING_?SAStoken << HTTP/1.1 404
# send the content. >> PUT /xml/test5.txt._COPYING_?comp=blocklist&SAStoken x-ms-client-request-id: 6738c38e-0a2c-4d9c-9c49-4f9657ff8eb0 x-ms-meta-hdi_permission: {"owner":"alexander","group":"supergroup","permissions":"rw-r--r--"} x-ms-meta-hdi_tmpupload: _%24azuretmpfolder%24%2Ffa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_ >> PUT /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?blockid=AAAAALjq9uI%3D&comp=block&SAS-token ( content is send here, with some XML )
# is the send file there? >> HEAD /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?SAS-token << HTTP/1.1 200 OK
# now move the file: >> PUT /xml/test5.txt._COPYING_?SAS-token x-ms-version: 2013-08-15 User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64) x-ms-client-request-id: 13c057d3-eebe-462a-869c-fb39429665dc x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_
# respone with body:
<< HTTP/1.1 404 <?xml version="1.0" encoding="utf-8"?> <Error><Code>CannotVerifyCopySource</Code> <Message>The specified resource does not exist. RequestId:da53e5d1-0001-0109-39e3-7049dc000000 Time:2017-01-17T17:01:38.5004017Z</Message></Error>

 

 

 

Our script to Generate a SAS-token, based on https://github.com/Azure-Samples/hdinsight-dotnet-python-azure-storage-shared-access-signature/blob/...

 

Spoiler
import time
import getpass
from azure.storage import AccessPolicy
from azure.storage.blob import BlockBlobService
from datetime import datetime, timedelta


def main():
    print("Going to generate a Container SAS-token.")
    conf = get_user_input()
    blob_service = get_blob_service(**conf)
    policies = get_policies(blob_service, conf["container_name"])
    if conf.get('policy_name') not in policies.keys():
        add_new_policy(blob_service, policies, **conf)
    generate_sas_token(blob_service, **conf)


def get_user_input():
    return {'account_name':     raw_input('StorageAccount Name: '),
            'account_key':      getpass.getpass('StorageAccount Key: '),
            'container_name':   raw_input('Container: '),
            'permissions':      raw_input('Permissions [rwdl] (default "rl"): ') or 'rl',
            'policy_name':      raw_input('Policy name (default "readonly"): ') or 'readonly',
            'expiry_days':      int(raw_input('Expiry days (default 365): ') or '365'),
            'ip_filter':        raw_input('IP filter (default None): ') or None}


def get_blob_service(account_name=None, account_key=None, container_name=None, **unused):
    blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
    if not blob_service.exists(container_name):
        raise IOError(
            "Container '%s' does not exist in StorageAccount '%s'!" % (container_name, account_name))
    else:
        print('can access the container in that storage account.')
    return blob_service


def add_new_policy(blob_service, policies, container_name=None, policy_name=None, expiry_days=None, permissions=None, **unused):
    expiry = datetime.utcnow() + timedelta(days=expiry_days)
    access_policy = AccessPolicy(permission=permissions, expiry=expiry)

    policies[policy_name] = access_policy
    print('adding new policy...')
    # Set the container to the updated list of identifiers (policies)
    blob_service.set_container_acl(container_name, signed_identifiers=policies)
    # Wait 3 seconds for acl to propagate
    time.sleep(3)
    print("new policy is added.")


def get_policies(blob_service, container_name):
    print('fetch current policies...')
    identifiers = blob_service.get_container_acl(container_name)
    for k, v in identifiers.items():
        print(" - '%s': permissions: %s, start: %s, expiry: %s" % (k, v.permission, v.start, v.expiry))
    return identifiers


def generate_sas_token(blob_service, container_name=None, policy_name=None, ip_filter=None, account_name=None, permissions=None, expiry=None, **unused):
    print("generating new sas token...")
    # Generate a new Shared Access Signature token using the policy (by name)
    sas_token = blob_service.generate_container_shared_access_signature(
        container_name, id=policy_name, ip=ip_filter, protocol='https')
    print('')
    print('Now you add/update in ClouderaManager -> HDFS -> config: core-site.xml')
    print('')
    print('=== key ===')
    print('fs.azure.sas.%s.%s.blob.core.windows.net' % (container_name, account_name))
    print('=== value ===')
    print(sas_token)
    print('=== description ===')
    print('Token with permissions: "%s", expires "%s"' % (permissions, expiry.date()))
    print('')
    print('Now, restart HDFS and test with command:')
    print('hdfs dfs -ls wasbs://%s@%s.blob.core.windows.net/' % (container_name, account_name))


if __name__ == "__main__":
    main()

 

Explorer
Posts: 22
Registered: ‎02-17-2015

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

As I mentioned before the command works with Account Key, not using the SAS-token.

I did a TCPDump of both situations the part where the SAS-token fails after the PUT to move the file:

 

# ===== Account Key ======
PUT /CONTAINER/test6.txt._COPYING_?timeout=90 HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: c809fa4d-1f75-4acd-a2d5-9ddbb33d15b6
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/0495c6ef-5529-42cb-ae51-aa479c609493test6.txt._COPYING_
x-ms-date: Tue, 17 Jan 2017 17:42:49 GMT
Authorization: SharedKey MYSTORAGEACCOUNT:/8hrG9WRAjAAAlASkaQPHx3hDZF535lqnsSH18asD5M=
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0

HTTP/1.1 202 Accepted
Transfer-Encoding: chunked
Last-Modified: Tue, 17 Jan 2017 17:42:49 GMT
ETag: "0x8D43F00414594DE"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4f36d344-0001-0059-68e9-701081000000
x-ms-version: 2013-08-15
x-ms-copy-id: dcdf5297-4570-43ff-920e-3bb1e3f0ce01
x-ms-copy-status: success
Date: Tue, 17 Jan 2017 17:42:48 GMT

# ========= SAS Token ============
PUT /CONTAINER/test7.txt._COPYING_?sp=rwdl&sr=c&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z&timeout=90&sig=YyX%2BL%2FTpXAAAGAi0vqfiipuD9iVM31F0Pjwup7tA%3D HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: dc55f745-482f-426e-96f2-c906d90ffb46
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/47c25af4-b68f-4239-b573-c51796fb2335test7.txt._COPYING_
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0


HTTP/1.1 404 The specified resource does not exist.
Explorer
Posts: 22
Registered: ‎02-17-2015

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

After some time I found a possible problem.

The header x-ms-copy-source refers to the original Blob to copy. 

When you suffix the file WITH the SAS-token the PUT-request works...

 

Going to rest now... and sleep on it...

Explorer
Posts: 22
Registered: ‎02-17-2015

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

This issue is nested in the azure-jars shipped with the distribution.

Cloudera is using a very old azure-storage jar version 0.6.0.

 

This issue not present is Hortonworks  (and Azure HDInsight) because they use an up-to-date version.

 

I tried to replace the jars with updated jars.

But they are linked to hdfs-2.7.x and Cloudera is using hdfs-2.6.x, so it did not work in the end.

 

Is there an update of azure-storage jars on the roadmap of Cloudera?

 

 

New Contributor
Posts: 1
Registered: ‎06-20-2017

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Can you please give example of how it worked -->
When you suffix the file WITH the SAS-token the PUT-request works...

Also you mentioned with updated libraries it worked. Can you please tell which versions worked for you?

Explorer
Posts: 22
Registered: ‎02-17-2015

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

[ Edited ]

It was a while ago.

 

Cloudera works fine with AccountKey. (read + write)

Cloudera cannot write using an SAS-token.

Reading from a blob with SAS-token works fine.

 

We tested the same situation with HDInsight (Hortonworks)

Add the storage account with SAS token. This works (read, write)

 

I could not update/replace the Azure-jars, because of breaking changes in the API.

(Cloudera Hadoop 2.6.0 vs Hortonworks Hadoop 2.7.x)

 

To answer you question. I recorded the network stream and found the problem.

I replayed the PUT request with a low-level network tool with SAS-token as addition behind the x-copy-source header. Then the request was successful. The problem is the code generating this request Azure-Storage inside the old jar still 0.6.0 working with Hadoop 2.6.x.

Highlighted
New Contributor
Posts: 5
Registered: ‎03-13-2018

Re: SAS token write to Azure BlobStorage (StorageException: The specified resource does not exist)

Hello,

 

Option 1 works fine? you has changed only the core-site.xml with the account/key or has configured other things ?

In cdh 10.2 appear an error ...

The value for one of the HTTP headers is not in the correct format

thanks

Announcements