tutorials|February 04, 2021|2 min read

Azure Storage Blob - How to List and Download Blob from Azure Storage container in Python (No Azure library)

TL;DR

Use Python's requests library to call Azure Storage REST APIs directly to list and download blobs without any Azure SDK dependency.

Azure Storage Blob - How to List and Download Blob from Azure Storage container in Python (No Azure library)

Introduction

In this tutorial we will see, How to list and download storage container blobs without using Azure python libraries.

Note: There is no azure library used, just rest api calls.

Pre-requisite

This tutorial is based upon Python-3.7

Pypy Dependency

We would require requests.

Complete Code

import requests
import re
import os

def _get_file_list_helper(container, next_marker=None):
  """
  Get the files list by using next_marker
  """
  account_name = container['account_name']
  container_name = container['container_name']
  curl_url = f'https://{account_name}.blob.core.windows.net/{container_name}?restype=container&comp=list&' 
  if next_marker:
    curl_url += f'marker={next_marker}&'
  curl_url += container['sas_token']

  print('Executing rest call to azure')
  r = requests.get(curl_url)
  text = r.text

  # this marker indicates there are more files
  next_marker = re.findall('<NextMarker>([^<]*)</NextMarker>',text)
  file_names = re.findall('<Name>([^<]*)</Name>',text)

  return {'files': file_names, 'next_marker': next_marker}  

def get_file_list(container):
  """
  Get the files list
  """
  files = []
  next_marker = None
  while True:
    files_data = _get_file_list_helper(container, next_marker)
    files.extend(files_data['files'])
    if not files_data['next_marker']:
      break
    next_marker = files_data['next_marker'][0]
  return files

def dowload_files(container, local_dest_path):
  files = get_file_list(container)

  account_name = container['account_name']
  container_name = container['container_name']
  url_path = f'https://{account_name}.blob.core.windows.net/{container_name}/'
  url_end_path = '?'  + container['sas_token']

  for file_name in files:
    print(f'Downloading: {file_name}')
    url = f'{url_path}{file_name}{url_end_path}'
    path = f'{local_dest_path}/{file_name}'
    if not os.path.exists(os.path.dirname(path)):
      os.makedirs(os.path.dirname(path))

      # make the request
      r = requests.get(url)

    # write the file
    with open(path, "wb") as download_file:
      download_file.write(r.content)

## main starts here
local_dest_path = './container_blob'

container = {
    'account_name': 'account_name',
    'container_name': 'container_name',
    'sas_token': 'xxxxxxxxxx'
}
dowload_files(container, local_dest_path)

Explanation

The code is very simple to understand. We are using Azure REST APIs to list and download storage blobs.

next_marker understanding

In cases, where there are more files in your storage container. The response does not have all the files in one response call. It instead returns a fixed number of items and a next_marker. Which indicates, there are more files. This marker has to be sent in next requests.

Usage with Azure Official Python Libraries

For usage with Azure official Python libraries, see: List and Download Azure blobs by Azure Python Libraries

Response to get blob Rest API

<?xml version="1.0" encoding="utf-8"?><EnumerationResults ServiceEndpoint="https://hubbledmeprodlocb.blob.core.windows.net/" ContainerName="container_name">
  <Blobs>
    <Blob>
      <Name>abc/test.log</Name>
      <Properties>
        <Last-Modified>Mon, 02 Dec 2019 09:42:50 GMT</Last-Modified>
        <Etag>0x8D7770BFF1CC8A1</Etag>
        <Content-Length>423</Content-Length>
        <Content-Type>application/octet-stream</Content-Type>
        <Content-Encoding /><Content-Language />
        <Content-MD5>3ycLC3CutKkybJtlgvEdsQ==</Content-MD5>
        <Cache-Control />
        <Content-Disposition />
        <BlobType>BlockBlob</BlobType>
        <LeaseStatus>unlocked</LeaseStatus>
        <LeaseState>available</LeaseState>
        <ServerEncrypted>true</ServerEncrypted>
      </Properties>
    </Blob>
  ...
  </Blobs>
  <NextMarker>marker_id</NextMarker>

</EnumerationResults>

Hope it helps.

Related Posts

Drupal 8 - How to create a Page with admin access and create its menu entry in Reports (No Coding)

Drupal 8 - How to create a Page with admin access and create its menu entry in Reports (No Coding)

Introduction I needed a report page, where I wanted to have some information…

How to Renew Lets Encrypt SSL Certificate

How to Renew Lets Encrypt SSL Certificate

Introduction to problem This post is applicable for those who has already an SSL…

Python 3 - Format String fun

Python 3 - Format String fun

This post is dedicated for cases where we intend to append a variable value in a…

Python - How to apply patch to Python and Install Python via Pyenv

Python - How to apply patch to Python and Install Python via Pyenv

Introduction In this post, we will see how we can apply a patch to Python and…

Python: How to generate string of arbitrary length of any alphabet characters

Python: How to generate string of arbitrary length of any alphabet characters

I was testing a bug where a field was limited to 255 characters only. I needed…

Linkage Error Loader Constraint Violation - JUnit test case development issue

Linkage Error Loader Constraint Violation - JUnit test case development issue

Its good to write unit tests cases, and this part is mostly forgotten by…

Latest Posts

REST API Design: Pagination, Versioning, and Best Practices

REST API Design: Pagination, Versioning, and Best Practices

Every time two systems need to talk, someone has to design the contract between…

Efficient Data Modelling: A Practical Guide for Production Systems

Efficient Data Modelling: A Practical Guide for Production Systems

Most engineers learn data modelling backwards. They draw an ER diagram…

Deep Dive on Caching: From Browser to Database

Deep Dive on Caching: From Browser to Database

“There are only two hard things in Computer Science: cache invalidation and…

System Design Patterns for Real-Time Updates at High Traffic

System Design Patterns for Real-Time Updates at High Traffic

The previous articles in this series covered scaling reads and scaling writes…

System Design Patterns for Scaling Writes

System Design Patterns for Scaling Writes

In the companion article on scaling reads, we covered caching, replicas, and…

System Design Patterns for Managing Long-Running Tasks

System Design Patterns for Managing Long-Running Tasks

Introduction Some operations simply can’t finish in the time a user is willing…