Azure Event Hub: What it is and how it works

Azure Event Hubs is a (fully managed) data streaming platform and a highly scalable event ingestion service, capable of receiving and processing millions of events per second. Part of Azure services, Event Hubs can process and store events, data, or telemetry produced by distributed software and devices. With the ability to offer broad-scale, low-latency, publish-subscribe functionality, Event Hubs acts as a “gateway” for Big Data. In this article we are going to better see what it is, how it works, the differences with Event Grid and what are the factors that influence the costs for using the service.

What you'll find in this article

  • What is Azure Event Hub
  • Azure Event Hub: How does it work?
  • Azure Event Hub vs Event Grid
  • Azure Event Hub Pricing: How are costs calculated?
Azure Event Hub: What it is and how it works

What is Azure Event Hub?

Big data has revolutionized the way we do business, and analyzing information in real time to capture useful insights can make the difference between maintaining your competitive advantage and falling behind. However, building the right infrastructure to manage the constant flow of data can become complex, and big data is only beneficial when there is a simple way to process and obtain timely insights from data sources.

And this is where Microsoft comes in with its Azure Event Hubs, the event ingestion service part of Azure's vast suite of cloud services. Event Hubs facilitates the collection, storage and processing of large amounts of data in real time, making it indispensable for companies that need to obtain immediate insights from their operations.

Whether it's data from website activity, applications, IoT devices, or any other source, Event Hubs can ingest and process it in real time. This ability is especially crucial in scenarios where latency can influence decision-making processes, such as in financial trading platforms or in real-time personalization for e-commerce.

Overview of Azure Event Hubs

Azure Event Hub: How does it work?

Event Hubs by Microsoft Azure is a message flow management platform, but it has different characteristics than traditional business messaging services. The capabilities of Event Hubs are in fact built around high throughput capacity and event processing scenarios.

In the context of Event Hubs, messages are referred to as event data. The event data contains the body of the event, which is a binary stream (in which you can insert any binary content such as serialized JSON, XML, etc.), a bag of user-defined properties (name-value pairs), and various system metadata about the event, such as the offset in the partition and its number in the stream sequence.

The common role that Event Hubs plays in solution architectures is that of a 'gateway' to a pipeline of these 'events', called event ingestor. An event ingestor is a component or service found between event producers and event consumers, to decouple the production of a flow of events from the consumption of those events.

Event Hubs makes use of two main transport protocols for sending and receiving events. On the publisher side, you can use HTTPs if you plan to send a low volume of events to Event Hubs, or you can use AMQP for high throughput and better performance scenarios. On the consumer side, because Event Hubs use a push-based model to send events to listeners/receivers, AMQP is the only option available.

But what, in simple terms, is the difference with a normal message? Let's take a moment to understand the difference between the two.

  • I posts is raw data that passes between the sender and the recipient's processes in the application. They contain the data itself, not just a reference. The sending process expects the receiving process to process the content of the message in a specific way.
  • Gli manifestations are less complex than messages and are often used for broadcast communications. This is a notification that indicates an event or action that occurs in the process.

A business application might choose to use events for certain processes and switch to messages for others. In general, the use of a message or event can be determined by answering the question: “Does the sender expect the recipient to perform specific processing in the communication?”

If the answer is yes, it's a message; otherwise, it's an event.

To understand the functioning of Event Hub as clearly as possible, in the next sections we will dedicate ourselves to examining more closely the key elements of its architecture and how they fit into the wider context of the platform.

Namespace

A namespace is a logical container that groups one or more Event Hubs. You can imagine it as a folder that organizes and manages all your event hubs in a single place, allowing you to group related resources under a single centralized management.

It's an organizing tool that allows you to efficiently manage multiple Event Hubs, especially when it comes to scaling operations. Each Event Hub within a namespace shares the same security and configuration context, which simplifies the management of security policies, such as access keys and authorization rules, making the namespace a central point for resource security, where it is possible to apply the same rules to all the Event Hubs within it.

When a namespace is created in Azure, you must choose a unique name (throughout Azure) that will become part of the endpoint URL used to access the Event Hubs. The namespace is associated with a specific Azure region, where resources are allocated and, within it, various Event Hubs can be created, each with customized configurations to manage specific event ingestion and processing scenarios.

Namespace in Microsoft Azure Event Hubs

Publisher

A publisher is an entity or application that sends events or messages to an Event Hub. Its main function is to generate and load data into the system, which can then be processed by other components.

When a publisher submits an event, these events are distributed and stored in the Event Hub partitions. The publisher can be any application or service that needs to send data, such as an IoT application that transmits information from sensors, a logging system that sends event recordings, or even a web application that collects and transmits user data.

Communication between the publisher and the Event Hub takes place through an API, and the publisher can configure the type of events to send and how they should be structured. You can publish events individually or in batches. A single publication, whether a single event or a batch of events, has a maximum size limit of 256 KB. Posting events larger than this limit will result in an exception (quota exceeded).

How publishers work in Azure Event Hubs

Partitions

A partition is basically an ordered flow of events within an Event Hub, and each event sent to an Event Hub is assigned to a specific partition. These are one of the key differences in the way data is stored and retrieved compared to other Azure services such as Service Bus, with its Queues and Topics.

The traditional Queues and Topics are designed based on the “Competing Consumer” model, in which each consumer tries to read from the same queue (imaginable as a single-lane road, where vehicles proceed one after the other, without the possibility of overtaking), while the Event Hubs are designed based on the “Partitioned Consumer” model (similar to parallel lanes on a highway, where traffic flows in a more distributed way).

When an event is sent to an Event Hub, it is assigned to one of the available partitions. The assignment mode can be random or based on a partition key provided by the user (partition key), which ensures that related events always go to the same partition. Within a single partition, events are ordered chronologically, which is important when the order of the events has meaning for processing (for example, operations that must be performed sequentially).

Each partition can be read in parallel by different consumers, allowing large amounts of data to be processed more quickly. The number of partitions determines the maximum parallelization capacity of an Event Hub, and the more partitions that are configured, the greater the possibility of scaling the reading and processing of events.

However, the number of partitions must be decided when creating the Event Hub and cannot be changed later. Using the Azure portal you can create between 2 and 32 partitions (the default is 4), however, if necessary, you can create up to 1024 partitions by contacting Azure support.

Each partition has a storage capacity defined by the retention period configured for the Event Hub (for example, one day, seven days, etc.). The events are kept inside the partition for the specified time and then automatically deleted.

The other important difference between normal Queues and Event Hub partitions is that, in Queues, once the message has been read, it is removed from the queue and is no longer available (in case of errors, it will be moved to the dead letter queue). Instead, in Event Hubs partitions, the data in the partition remains available even after being read by the consumer, allowing the data to be returned and re-read if necessary (for example, in the event of a loss of connection).

Did you know that we help our customers manage their Azure tenants?

We have created the Infrastructure & Security team, focused on the Azure cloud, to better respond to the needs of our customers who involve us in technical and strategic decisions. In addition to configuring and managing the tenant, we also take care of:

  • optimization of resource costs
  • implementation of scaling and high availability procedures
  • creation of application deployments through DevOps pipelines
  • monitoring
  • and, above all, security!

With Dev4Side, you have a reliable partner that supports you across the entire Microsoft application ecosystem.

Consumer

A consumer is an application or service that reads and processes events sent to an Event Hub. Once the events have been published and stored in the Event Hub partitions, the consumer takes care of retrieving them and processing them according to the needs of the application.

The consumer's role is fundamental to transform raw data into useful information and reads events from one or more consumer groups, which allows for multiple applications that read the same data without interfering with each other.

The consumption process also involves managing the reading position (offset), which keeps track of the events already processed and ensures that they are not read again, unless required. Consumers can be designed to scale horizontally, allowing multiple instances of the same consumer to operate in parallel and improve data processing efficiency.

Consumer Group

The data stored in your event hub (in all partitions) can be consumed (or read) by different consumers based on their needs. Some consumers will want to read them carefully only once, while others may want to go back and read historical data several times.

To support these varying requirements, Event Hub uses Consumer Groups. A consumer group is simply a group of consumers who collaborate to read and process messages from one or more topics or partitions in a coordinated, independent manner and without interference between them. Each consumer group maintains its own reading position (offset) for each partition of the Event Hub, so each group can read and process events at their own pace and according to their needs.

This is especially useful when you want different applications to process the same events for different purposes and one consumer group could be used, for example, for a real-time monitoring application, while another could be used for historical data analysis. Each consumer group will see the events as if they were the only reader, maintaining their reading history and thus allowing parallel processing without conflicts.

The data in the event hub can only be accessed through the consumer group, and the partitions cannot be accessed directly to retrieve the data. When an event hub is created, a predefined consumer group is also created.

Stream processing architecture in Azure Event Hubs

Event Processor Host

Event Hubs mainly offer two different modes for consuming events: the first using direct receivers and the second through an intelligent host called the 'Event Processor Host'.

If you decide to build your own direct receivers, you will need to manually manage a whole series of aspects, a task that risks becoming repetitive and that will require a certain level of knowledge to write the receivers efficiently.

The Azure Event Hub Event Processor Host is a library that simplifies the process of receiving and managing events from an Event Hub. It deals with automatically managing load balancing between multiple instances of event processors, distributing event partitions in a balanced way between the various hosts. This approach allows event processing to be easily scaled, allowing different applications to read in parallel from the same Event Hub without overloading a single instance.

Every time an event is received, the Event Processor Host ensures that it is processed correctly and, if necessary, keeps track of the last processed position within a partition. This functionality is especially useful for managing interruptions, since processing can resume from the exact point where it was interrupted. In addition, the Event Processor Host automatically manages the competition between the different instances, ensuring that a single partition is not processed simultaneously by multiple hosts, thus avoiding duplication or inconsistencies.

Throughput Unit

Throughput Units (TU) represent the system's ability to manage the volume of incoming and outgoing data. You can imagine them as pipes: if you need to pass more water you need to add more pipes. The circumference of the tube is fixed and can only hold a certain amount of water. So, if there is a need to fill the tank or take more water, the only way to do it is to add or arrange more tubes and each tube that is added involves a cost.

Each Throughput Unit represents a fixed amount of computing, network and storage resources, and is designed to ensure that the system can support a certain volume of traffic. A TU can handle up to 1 MB per second or 1,000 incoming events per second and up to 2 MB per second outgoing. This means that the Event Hub can receive and process up to 1 MB of data per second per TU and send up to 2 MB of data per second.

If the workload exceeds these capacities, it is possible to increase the number of TUs allocated to handle larger volumes of data. For example, by configuring three TUs, the Event Hub will be able to manage up to 3 MB per second in input and up to 6 MB per second in output. Azure also offers the ability to automatically scale the number of TU if the data load increases beyond the configured capacity, within a predefined maximum limit.

Azure Event Hub vs Event Grid

With more than 200 active services on Azure, Event Hub is not the only cloud platform service dedicated to messaging or even the only one dedicated to event management. In fact, there is another service with which Event Hub is often confused (especially by Azure novices) and it is Azure Event Grid. However, the two services could not be more different and in this section we will try to briefly understand the difference.

Azure Event Hub, as we've seen, is primarily designed for data streaming. Although it can handle events, its main focus is on the ingestion, processing and archiving of large volumes of information generated from different sources in real time. This makes it particularly suitable for scenarios such as telemetry and IoT data, where high-capacity, low-latency data processing is critical.

Instead, the main purpose of Azure Event Grid is event routing. It is designed to capture events from various Azure services and deliver them to specific event managers or endpoints. This capability makes it a solid choice for building responsive, event-based applications, where it is necessary to respond to the latter as they occur. The service acts as an intermediary, ensuring that events reach their intended destinations, which can be Azure Functions, webhooks, or custom applications.

So when to use one or the other? If you need to manage large amounts of information intensively and in real time, Event Hub is the right solution, while Event Grid is the best choice when you need a system to orchestrate and distribute events between different components of your ecosystem.

If we really want to be honest, the two services would not even be mutually exclusive and their differences could even make them complementary to optimize event and data management in an event-based architecture, using Event Hub to collect and pre-process data and Event Grid to distribute the processed events to different endpoints or managers.

Overview of Azure Event Grid

Azure Event Hub Pricing: How are costs calculated?

Azure Event Hubs pricing can be complex and depends on a number of factors. To begin with, each Event Hub instance (entity) that you create has an associated cost. This cost is calculated based on the service plan chosen and the duration of use. The price levels made available are respectively:

  • Basic: offers basic functionality and limited support, at a lower cost than the Standard tier. It is suitable for simple scenarios where there is no need for support for advanced features such as geographic replication or consumer group management.
  • Standard: includes advanced features such as geographic replication, consumer group management, and increased throughput capacity. It is suitable for more complex and critical production scenarios.

The cost of the service, as well as on the basis of the chosen plan, is calculated starting from several components that are:

  • Throughput Units: as we saw earlier, these represent the amount of throughput that you need to manage. The costs increase with the number of configured throughput units.
  • Partitions: Each Event Hub is divided into partitions, and the cost per partition is separate. The size and number of partitions can affect costs.
  • Inputs and Outputs: the cost per volume of data sent (inputs) and read (outputs) may influence the price. Data is measured in GB and there are rates per GB of data transmitted or processed.
  • Data Storage (Capture): If you use the capture functionality to store events in long-term storage (for example, Azure Blob Storage), there are additional costs based on the volume of data stored and the frequency of access.
  • Data Transfers (Egress): Data transferred outside Azure Event Hubs (for example, to another region or service) may incur additional costs. Data transfers within the same region usually do not involve additional costs.

The specific prices may also vary depending on the geographical region in which the service is used and additional costs may arise if other Azure features such as Monitor or Log Analytics and associated services for security and network management, such as Virtual Networks (VNet) or Azure Firewall, are used in combination with Event Hubs.

To obtain more specific details and begin to make a first estimate of the service costs for your organization, please consult the official service page (available hither) where with the convenient tool provided by Microsoft you can calculate your price based on the chosen region and its associated currency.

Conclusions

If it were not clear to anyone navigating the contemporary business landscape, the correct management of the massive data flows generated by organizations every day is now a crucial factor in the digital strategies of any business that wishes to remain competitive and in step with the times.

And with its ability to support complex streaming scenarios and the ability to manage data from different sources that make it one of the best event ingestors available on the market, Azure Event Hubs is configured as a powerful and scalable solution for managing massive data flows in real time.

Whatever your need, from acquiring telemetry data from vehicles, to applications and IoT scenarios where millions of devices send data to the cloud, Azure Event Hubs could be the answer to your problems.

FAQ on Microsoft Azure Event Hubs

What is Azure Event Hubs?

Azure Event Hubs is a fully managed data streaming platform and a highly scalable event ingestion service, capable of receiving and processing millions of events per second. It is part of Microsoft Azure services and allows you to process and store events, data or telemetry from distributed software and devices.

What are the key features of Azure Event Hubs?

Azure Event Hubs offers low-latency, larger-scale publish-subscribe functionality, acting as an “access ramp” for Big Data. It supports the ingestion of large volumes of data in real time, making it ideal for scenarios such as analyzing website activity, applications, IoT devices, and more.

How does Azure Event Hubs work?

Event Hubs acts as an “ingestor” of events, positioning itself between the producers and consumers of events to decouple production from consumption. It uses protocols such as HTTPs for sending low-volume events and AMQP for high-throughput scenarios. Event data can be represented in binary formats such as serialized JSON or XML.

What's the difference between Azure Event Hubs and Azure Event Grid?

Although both are event management services on Azure, Event Hubs is optimized for ingesting large volumes of real-time data with high throughput, while Event Grid is designed for managing single events with low latency, facilitating the orchestration of events between services.

How are Azure Event Hubs costs calculated?

The costs of Azure Event Hubs depend on various factors, including the number of throughput units purchased, the amount of data ingested and processed, and the additional functionality used. It is important to consult the Azure pricing calculator to get an accurate estimate based on your needs.

Find out why to choose the team

Infra & Sec

The Infra & Security team focuses on the management and evolution of our customers' Microsoft Azure tenants. Besides configuring and managing these tenants, the team is responsible for creating application deployments through DevOps pipelines. It also monitors and manages all security aspects of the tenants and supports Security Operations Centers (SOC).