Azure Blob Storage: How to keep unstructured data

Azure Blob Storage is a cloud storage solution offered by Microsoft Azure, designed for managing large amounts of unstructured data. The service allows companies to store any type of file, from images to documents, from backup files to log data, securely and accessible from anywhere in the world. Thanks to its flexible architecture, Azure Blob Storage supports different types of blobs, each optimized for different usage scenarios and in this article we will take a look at what it is, how it works, what the costs are and how to optimize them.

What you'll find in this article

  • Azure Blob Storage: a brief introduction
  • How does Azure Blob Storage work
  • Azure Blob Storage: access levels and features
  • Azure Blob Storage: pricing and how to optimize costs
Azure Blob Storage: How to keep unstructured data

Azure Blob Storage: a brief introduction

Today, data is an inestimable resource and many companies store large amounts of it for their daily operations and often companies have to keep large amounts of data without any organization.

This information (defined as unstructured data) is data that does not fit a specific model or definition (such as text or binary data) and may include audio, video, and text. It is estimated that around 80% of the world's business data whether unstructured or semi-structured.

Every day, organizations of all shapes and sizes generate tons of this data and where storing all this unorganized information in an economic, efficient and always accessible way is one of those problems that should not be underestimated if you want to get the most out of the data you want to keep.

The solution to this problem? Azure Blob Storage.

The service made available on Microsoft's cloud platform allows developers to store large amounts of unstructured data within Azure storage spaces.

Blob Storage is, of course, a cloud-native service, which means that this data will be accessible wherever there is an internet connection, which makes it a natural choice for companies that already operate in the cloud or that are considering migrating their storage spaces “into the clouds”.

But how does it work? What options does it provide? Let's find out in the next sections.

How does Azure Blob Storage work

To begin to understand how it works, we must start from the concept of a 'blob'.

A 'blob' (acronym for Binary Large Object) is nothing more than a unit of data that can be saved and managed within the Microsoft Azure storage service and represents large files such as documents, images, videos or backups.

Blob Storage allows these large files to be transferred across a network by dividing the files into smaller pieces during upload. Once uploaded to the cloud server, they appear together as a single file.

Users can access objects in Blob Storage via HTTP/HTTPS anywhere in the world through the Azure Storage REST API, Azure CLI, Azure PowerShell, or an Azure Storage client library. Client libraries can also be accessed through different languages, including .NET, Node.js, Java, Python, PHP, Ruby, and Go.

The service is ideal for the following applications:

  • Share videos, documents or images directly through a browser
  • Video and audio storage
  • Storing and updating log files
  • Data archiving for backup and recovery, archiving and disaster recovery
  • Internet of Things (IoT)
  • Data storage for analysis

Blob Container

Blobs are grouped into containers that organize a set of blobs in the same way that a file system directory organizes files. You can imagine a blob container as a drawer where you can manage files.

You could use one container to store audio files and another to store video files. A storage account can include unlimited containers, and a container can contain unlimited blobs.

Containers are a form of cloud computing and are autonomous, meaning they have all the necessary dependencies. The maximum amount of data a container can hold is 500 TB. Your container name must be a valid DNS name because it forms the unique resource identifier (URI) that identifies the container and its blobs. Microsoft suggests several rules to follow when naming a container:

  • Container names must be between 3 and 63 characters long.
  • Names must begin with a letter or number. The container name will contain only lowercase letters, numbers, and the minus sign (-).
  • It is not possible to have two or more minus consecutive sign characters.

Diagram showing the relationship between a storage account, containers, and blobs
Relationship between account, container, and blob object

Blob Storage Types

Microsoft Azure offers three types of blobs: block blob, append blob, and page blob.

When you create a blob, you specify which type of blob you want; once created, it is no longer possible to change its type. You can update the blob only by using the operations that are appropriate for each type of blob.

All blobs, regardless of type, are encrypted before being stored in the cloud. Cloud encryption encrypts data when it travels between cloud-based storage and its respective destinations. In addition to cloud security, Azure offers absolute control over who has access to their data.

It should also be noted the possibility of creating snapshots, which are “frozen” versions of the blob at a specific point in time, making it easier to restore to previous states if necessary for both block blobs and page blobs.

So let's see these types of blobs in a little more detail and what their functionalities are.

Block blob

Block blobs are a type of blob designed to handle large amounts of data, such as text or binary files, particularly suitable for storing files that could be modified or updated frequently. Each block blob consists of blocks of data that can be loaded and managed independently of each other.

When working with block blobs, you can load data in smaller blocks and then assemble them into a single blob, an approach that allows you to upload very large files even if the network has bandwidth limitations, since it is possible to load the blocks in parallel and manage the individual blocks independently. Blocks may have different sizes, but the current maximum is 4000 MiB (mebibyte) per block. A block blob can contain up to 50,000 blocks.

In addition, block blob management is flexible because it allows you to update only the blocks that change without having to rewrite the entire blob. This can be especially useful when managing large files or performing backup and recovery operations.

A block uploaded to your storage account is associated with the specific block blob, but it becomes part of the blob only when you commit a list of blocks, including the ID of the new block. It remains unconfirmed until you save or discard it. There can be up to 100,000 unconfirmed blocks.

Page Blob

Page blobs are primarily designed for storing and managing files that require frequent and random access to data. Unlike block blobs, which are optimized for loading and managing large block files, page blobs are designed to handle larger files with a structure that allows efficient access to pages of data.

A page blob consists of fixed-size pages, each of which can be read or written independently. Each page is usually 512 bytes and can be modified without affecting the other pages in the blob.

This structure is especially useful for scenarios that require random access to data, such as storing virtual machine virtual disks, where it is important to be able to read or write to specific locations without having to process the entire file.

Page blobs are designed to handle high-speed read and write operations, and they support partial update operations, meaning that you can edit only as many pages as you need without having to rewrite the entire blob, making them ideal for scenarios where data changes are frequent and unevenly distributed.

General relationships between accounts, containers, and page blobs

Append blob

Append blobs are the third and final type of blob, designed specifically for adding data. Unlike block blobs and page blobs, which are better suited for modifying and accessing data at random, append blobs are optimized for scenarios where new information must be continuously added without modifying or removing existing data.

An append blob is structured in such a way that data can only be added at the end of the blob. This structure makes it ideal for logging or collecting data that grows over time, such as application log files or continuously generated data streams (such as those of Internet-of-Things equipment). Every time you add data to an append blob, it is added in a new 'append operation', which ensures that the previous data remains intact and ordered chronologically.

The ability to add data without having to manage the details of the internal structure of the blob makes it very useful for applications that require logging operations, such as monitoring systems and applications that generate large amounts of log data. Append blobs are particularly advantageous in these cases because they offer high efficiency in sequential writing and guarantee data consistency without having to carry out complicated concurrency management operations.

In addition, writing in append blobs is atomic, which means that each addition operation is safe and complete and does not affect other simultaneous writing processes, maintaining data integrity even when there are many parallel writing operations.

Did you know that we help our customers manage their Azure tenants?

We have created the Infrastructure & Security team, focused on the Azure cloud, to better respond to the needs of our customers who involve us in technical and strategic decisions. In addition to configuring and managing the tenant, we also take care of:

  • optimization of resource costs
  • implementation of scaling and high availability procedures
  • creation of application deployments through DevOps pipelines
  • monitoring
  • and, above all, security!

With Dev4Side, you have a reliable partner that supports you across the entire Microsoft application ecosystem.

Azure Blob Storage: access levels and features

Azure Blob Storage offers four main access levels: Hot, Cool, Cold, and Archive, each designed to optimize cost and performance based on data access needs. These levels allow you to choose the most suitable solution based on the frequency of access and the duration of data storage, thus optimizing both costs and operational efficiency.

When a blob is created, you usually set one of the four available access levels for it. However, it is possible to change it without particular problems using the Set Blob Tier operation (the best option to go from the hot level to the cool level) or the Copy Blob operation (recommended for moving an Archive level blob online or from the cool level to the hot level).

When you change from Hot to Cool (or vice versa) or switch to the cold level or archive, the change that starts is instantaneous. But when a blob in the Archive layer changes to a Hot or Cool level, it needs to be 're-hydrated'. This operation can also take up to 15 hours due to the low priority given to files in the Archive tier.

But so let's take a closer look at these four levels of access and what characterizes them individually.

Hot

The Hot access tier in Azure Blob Storage is designed for data that is viewed and modified frequently. It's the ideal option when you need quick and regular access to your files. This level offers fast response times and low latencies, making it perfect for scenarios such as the archiving of commonly used files, databases, streaming multimedia content, and web applications that need constant access to data.

The storage cost for the Hot tier is relatively higher than the other access levels, but this cost is offset by lower data access costs. In fact, every time you access or modify the stored data, the operating costs are reduced compared to the levels with less frequent access. This means that for applications that require a large number of read and write operations, the Hot tier is the most economical and efficient choice.

Using the Hot tier is also advantageous because of the flexibility it offers in managing access peaks. Thanks to its ability to guarantee high performance, applications that depend on constant access to data can operate smoothly and without interruption.

Cool

The Cool access level in Azure Blob Storage is ideal for data that is rarely consulted but must remain available to be read and modified if necessary. This level offers a balance between storage costs and data access, being particularly advantageous for scenarios where data is not accessed frequently but must still be readily available.

The storage costs in the Cool tier are lower than the Hot tier, making it an economic choice for medium to long-term storage of data that doesn't require constant access. However, the costs of accessing data and operations are higher than the Hot tier, which means that it is less convenient if the data is consulted regularly. This level is perfect for data such as backup, disaster recovery data, log archives, and any other type of data that does not require immediate or frequent access.

The durability and availability of data in the Cool layer remain high, ensuring that data is always protected and recoverable when necessary. In addition, Azure allows flexible migration between access levels, allowing data to be moved to the Hot tier if it becomes more relevant and requires more frequent access, or to the Archive tier for even cheaper long-term storage.

Relative impact of packing files for the Cool layer

Cold

The Cold access level is a middle ground between the two previous levels and the Archive level: it is an online level designed to store data that is accessed more rarely or that is modified only occasionally. Data in the Cold tier must be kept for a minimum of 90 days, this tier offers lower storage costs and higher access costs than the Cool tier, sharing most of the features with the latter.

It is optimized for scenarios such as short-term backup and disaster recovery, where data must be kept for limited periods with occasional access, without the high costs associated with higher-performing storage levels.

This level offers a balance between the containment of storage costs and the need to have a recovery option that, although not immediate as in the Hot or Cool levels, still guarantees faster access times than the Archive level. The Cold tier thus becomes a strategic choice for companies that must manage large volumes of data, reducing operating costs without compromising the ability to recover in critical or emergency situations.

Archive

The Archive access level is designed for long-term retention of data that is rarely accessed.

This type of storage is an offline option for data that is rarely accessed (once every 180 days) and offers the lowest storage cost among the various levels available, making it ideal for storing large amounts of data that do not require frequent access. It's perfect for scenarios such as historical data archiving, long-term backups, regulatory compliance, and data that needs to be kept for extended periods without being frequently consulted.

When data is stored in the Archive tier, it is compressed and archived efficiently to reduce storage costs. However, accessing this data requires an 'un-archiving' process, or rehydration, which can take several hours. This means that the Archive layer is not suitable for data that may require immediate or frequent access.

Once the data is “re-hydrated” (the process by which the data stored in the Archive layer is decompressed and moved to a higher access level), it can be transferred to an access level more suitable for more immediate use, such as the Hot or Cool level.

However, the Archive level guarantees high durability and data security, ensuring that, despite the longer access time, the data is protected and recoverable when necessary. Azure Blob Storage allows flexible data management between different access levels, facilitating the transition of data stored in the Archive tier to more accessible levels when access needs change.

Relative impact of packing files for the Archive tier

Azure Blob Storage: pricing and how to optimize costs

Azure Blob Storage pricing is influenced by several factors, including the type of storage account, redundancy options, access levels, data transactions, and data egress. Understanding these factors is essential for making informed decisions about service costs and choosing the best options available for your archiving needs.

Chart that shows a bar for each tier which represents the monthly cost based on percentage read pattern
Impact on monthly spending given by the various reading rates

Storage account type and access levels

Azure offers three types of storage accounts: General-purpose v2 (GPv2), General-purpose v1 (GPv1), and Blob Storage.

GPv2 accounts offer the best performance and functionality, including tiered storage and advanced data management capabilities, and are recommended for most users, while GPv1 and Blob Storage accounts provide legacy support and specialized use cases, respectively.

We have already seen the access levels in the previous section but for convenience we summarize them here:

  • Hot Tier: for frequently accessed data, this tier offers high storage costs and low transaction costs.
  • Cool Tier: for data that is rarely accessed, this tier offers lower storage costs net of higher transaction costs than the Hot tier.
  • Cold Tier: to be a middle ground between the Cool and Archive tiers. It offers the lowest storage costs after tier archive and is optimized for short-term backup and disaster recovery scenarios.
  • Archive Tier: for long-term storage of rarely accessed data, this tier has the lowest storage costs but the highest transaction costs and retrieval latency.

Redundancy options

Azure Blob Storage offers several redundancy options to ensure data durability and availability. These options affect the overall cost of storage, as higher levels of redundancy require more resources.

  • Locally Redundant Storage (LRS): stores three copies of the data within a single data center. This option is the cheapest but offers the lowest level of durability.
  • Zone-Redundant Storage (ZRS): stores three copies of data in multiple availability zones within the same region, offering improved durability.
  • Geo-Redundant Storage (GRS): replicates its data to a secondary region, offering greater durability and availability in the event of a regional outage. This option is more expensive than LRS and ZRS.
  • Read-Access Geo-Redundant Storage (RA-GRS): offers the same redundancy as GRS, but adds read access to data in the secondary region, providing better performance and availability for workloads with read-intensive access.

Data transactions and egress

Azure Blob Storage charges for data transactions, such as read, write, and delete operations.

The cost per transaction depends on the level of access and the type of transaction. For example, writing and reading in the Cool layer are more expensive than in the Hot level. Understanding your data access patterns is essential to effectively optimize transaction costs.

Data egress refers to the transfer of data from Azure Blob Storage to an external destination. Azure charges for egress based on the amount of data transferred and the destination region. To minimize egress costs, you can consider optimizing data transfer using techniques such as compression, caching, and other strategies.

Considerations on how to optimize Blob Storage costs

When managing Azure Blob Storage costs, it's crucial to balance the tradeoffs between various factors. Some key considerations include:

  • Redundancy vs. Cost: higher levels of redundancy provide better durability and availability, but at a higher cost. To find the right balance, you need to evaluate the critical nature of your data and the potential impact of data loss or unavailability.
  • Access Levels vs. Data access pattern: Choosing the appropriate access level based on your data access patterns can significantly affect storage costs. Data that is accessed frequently should be stored in the Hot tier, while data that is rarely accessed should be moved to the Cool, Cold, or Archive levels to minimize costs.
  • Data transactions vs. Storage costs: understanding your data access patterns can help you optimize transaction costs. If your workload involves a large number of transactions, it may be more convenient to use the Hot tier, despite its higher storage costs. Conversely, if your workload involves infrequent transactions, using Cool, Cold, or Archive levels can help reduce costs.

For further clarification on Blob Storage pricing, please consult the official Azure page (available hither), where with the convenient tool provided by Microsoft you can calculate an estimate of the prices for using the service based on the type of file structure, the type of redundancy desired, region and currency.

Conclusions

The problem of unstructured data and its storage is here to stay and the disproportionate amount of information generated every day by organizations of all types and sizes will definitely not decrease in the near future.

It is therefore important to choose a reliable, effective and flexibly priced solution to make sure you take care of it as soon as possible and Azure Blob Storage could be just the solution you were looking for.

Solid structure, flexible data management, numerous integration tools, high performance and advanced security are just some of the features that distinguish the Blob Storage offer. For all the others, we invite you to try it and let the Azure platform storage service speak for itself.

FAQ on Azure Blob Storage

What is Azure Blob Storage?

Azure Blob Storage is a scalable, cloud-based object storage service offered by Microsoft Azure. It allows you to store large amounts of unstructured data such as text, images, and videos.

What types of data can be stored in Azure Blob Storage?

Azure Blob Storage is designed for unstructured data, meaning it can store data types like text files, images, videos, backups, and logs.

What are the different types of blobs in Azure Blob Storage?

Azure Blob Storage supports three types of blobs: Block Blobs, Append Blobs, and Page Blobs. Block Blobs are ideal for storing large files, Append Blobs are optimized for append operations, and Page Blobs are used for random access, such as for virtual hard disks.

How does Azure Blob Storage ensure data security?

Azure Blob Storage ensures data security through various features, including encryption at rest, role-based access control (RBAC), and the use of shared access signatures (SAS) for fine-grained access control.

What is the difference between Hot, Cool, and Archive tiers in Azure Blob Storage?

Azure Blob Storage offers three access tiers: Hot, Cool, and Archive. The Hot tier is for frequently accessed data, the Cool tier is for infrequently accessed data, and the Archive tier is for rarely accessed data that can tolerate longer retrieval times.

How do you access data stored in Azure Blob Storage?

You can access data stored in Azure Blob Storage via REST APIs, client libraries, tools like Azure Storage Explorer, or through the Azure portal.

What is a Shared Access Signature (SAS) in Azure Blob Storage?

A Shared Access Signature (SAS) in Azure Blob Storage is a URI that grants restricted access rights to Azure Storage resources, allowing you to specify permissions, expiration time, and access restrictions.

How can Azure Blob Storage be used for data backup?

Azure Blob Storage is ideal for data backup due to its scalable nature and the availability of different access tiers, which help optimize costs while ensuring data durability and accessibility.

What are the redundancy options available in Azure Blob Storage?

Azure Blob Storage offers several redundancy options, including Locally Redundant Storage (LRS), Zone-Redundant Storage (ZRS), Geo-Redundant Storage (GRS), and Read-Access Geo-Redundant Storage (RA-GRS), to ensure data durability and availability.

Can Azure Blob Storage be integrated with other Azure services?

Yes, Azure Blob Storage can be seamlessly integrated with other Azure services, such as Azure CDN, Azure Functions, and Azure Data Lake, to build comprehensive cloud solutions.

How does Azure Blob Storage support big data and analytics workloads?

Azure Blob Storage supports big data and analytics workloads by providing high-throughput data ingestion and integration with services like Azure Data Lake, Azure HDInsight, and Azure Databricks.

What are the pricing factors for Azure Blob Storage?

Azure Blob Storage pricing is based on several factors, including the volume of data stored, the type of storage tier (Hot, Cool, Archive), data transfer operations, and redundancy options selected.

How can you monitor and manage Azure Blob Storage?

You can monitor and manage Azure Blob Storage using the Azure portal, Azure Monitor, and other tools like Azure Storage Explorer, which provide insights into storage usage, performance, and security.

What is Azure Storage Explorer and how does it work with Azure Blob Storage?

Azure Storage Explorer is a tool that allows you to manage and access your Azure Blob Storage accounts, enabling you to upload, download, and organize blobs with a user-friendly interface.

How does Azure Blob Storage handle data durability?

Azure Blob Storage ensures data durability through its redundancy options, which replicate your data across multiple locations, either within the same region or across different regions.

Find out why to choose the team

Infra & Sec

The Infra & Security team focuses on the management and evolution of our customers' Microsoft Azure tenants. Besides configuring and managing these tenants, the team is responsible for creating application deployments through DevOps pipelines. It also monitors and manages all security aspects of the tenants and supports Security Operations Centers (SOC).