Introduction
In this tutorial we will see, How to list and download storage container blobs without using Azure python libraries.
Note: There is no azure library used, just rest api calls.
Pre-requisite
This tutorial is based upon Python-3.7
Pypy Dependency
We would require requests.
Complete Code
import requests
import re
import os
def _get_file_list_helper(container, next_marker=None):
"""
Get the files list by using next_marker
"""
account_name = container['account_name']
container_name = container['container_name']
curl_url = f'https://{account_name}.blob.core.windows.net/{container_name}?restype=container&comp=list&'
if next_marker:
curl_url += f'marker={next_marker}&'
curl_url += container['sas_token']
print('Executing rest call to azure')
r = requests.get(curl_url)
text = r.text
# this marker indicates there are more files
next_marker = re.findall('<NextMarker>([^<]*)</NextMarker>',text)
file_names = re.findall('<Name>([^<]*)</Name>',text)
return {'files': file_names, 'next_marker': next_marker}
def get_file_list(container):
"""
Get the files list
"""
files = []
next_marker = None
while True:
files_data = _get_file_list_helper(container, next_marker)
files.extend(files_data['files'])
if not files_data['next_marker']:
break
next_marker = files_data['next_marker'][0]
return files
def dowload_files(container, local_dest_path):
files = get_file_list(container)
account_name = container['account_name']
container_name = container['container_name']
url_path = f'https://{account_name}.blob.core.windows.net/{container_name}/'
url_end_path = '?' + container['sas_token']
for file_name in files:
print(f'Downloading: {file_name}')
url = f'{url_path}{file_name}{url_end_path}'
path = f'{local_dest_path}/{file_name}'
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))
# make the request
r = requests.get(url)
# write the file
with open(path, "wb") as download_file:
download_file.write(r.content)
## main starts here
local_dest_path = './container_blob'
container = {
'account_name': 'account_name',
'container_name': 'container_name',
'sas_token': 'xxxxxxxxxx'
}
dowload_files(container, local_dest_path)Explanation
The code is very simple to understand. We are using Azure REST APIs to list and download storage blobs.
next_marker understanding
In cases, where there are more files in your storage container. The response does not have all the files in one response call. It instead returns a fixed number of items and a next_marker. Which indicates, there are more files. This marker has to be sent in next requests.
Usage with Azure Official Python Libraries
For usage with Azure official Python libraries, see: List and Download Azure blobs by Azure Python Libraries
Response to get blob Rest API
<?xml version="1.0" encoding="utf-8"?><EnumerationResults ServiceEndpoint="https://hubbledmeprodlocb.blob.core.windows.net/" ContainerName="container_name">
<Blobs>
<Blob>
<Name>abc/test.log</Name>
<Properties>
<Last-Modified>Mon, 02 Dec 2019 09:42:50 GMT</Last-Modified>
<Etag>0x8D7770BFF1CC8A1</Etag>
<Content-Length>423</Content-Length>
<Content-Type>application/octet-stream</Content-Type>
<Content-Encoding /><Content-Language />
<Content-MD5>3ycLC3CutKkybJtlgvEdsQ==</Content-MD5>
<Cache-Control />
<Content-Disposition />
<BlobType>BlockBlob</BlobType>
<LeaseStatus>unlocked</LeaseStatus>
<LeaseState>available</LeaseState>
<ServerEncrypted>true</ServerEncrypted>
</Properties>
</Blob>
...
</Blobs>
<NextMarker>marker_id</NextMarker>
</EnumerationResults>Hope it helps.












