Azure Synapse: features, functionality, and pricing

Azure Synapse is an integrated analysis platform provided by Microsoft, designed to manage and analyze large volumes of data that combines big data processing with advanced business intelligence tools, allowing organizations to obtain rapid and significant insights from the information collected. Synapse provides users with a vast number of features, including native integration with Apache Spark, SQL on demand, and state-of-the-art orchestration tools. In this article, we'll take a closer look at what Azure Synapse is and explore its key features to find out how it can help your business get the most out of your data.

What you'll find in this article

  • What is Azure Synapse?
  • Azure Synapse architecture
  • Azure Synapse: key usage benefits
  • Azure Synapse vs. Data Factory
  • Azure Synapse pricing: costs and optimization
Azure Synapse: features, functionality, and pricing

What is Azure Synapse?

It should now be clear to anyone that, in the contemporary enterprise landscape, the correct management and analysis of data is now much more than a simple “good practice”. The volume of data generated by individuals and organizations is growing at a phenomenal rate. This data fuels businesses and other organizations by providing a basis for descriptive, diagnostic, predictive, and prescriptive analytical solutions that support decision-making and autonomous systems, offering real-time insights into established and emerging models.

It has become essential, for all those companies that want to stay ahead of their competitors, to know how to manipulate and extract relevant information that can help them grow their business. And with the incredible amount of data collected every day, engineers and data analysts need tools that allow them to carry out their tasks quickly and efficiently.

And this is where Azure Synapse, the data warehousing and analysis service provided through the Microsoft Azure cloud platform, comes into play. Synapse offers a simplified interface for capturing huge amounts of data from different data stores in an integrated data lake and customized data warehouse, where users can perform highly optimized analytical queries on the data that can integrate with machine learning and business intelligence (BI) tools.

The platform, originally known as Azure SQL Data Warehouse, has been evolved and expanded to include many more features, leading to the relaunch of the service under the name of Azure Synapse. This change marked an expansion of the service's capabilities to include not only data warehousing, but also data integration and support for big data, becoming an extremely useful tool for all those companies that want to get the most knowledge and profit from their data.

But how does it work specifically? Let's take a closer look at it and find out.

Did you know that we help our customers manage their Azure tenants?

We have created the Infrastructure & Security team, focused on the Azure cloud, to better respond to the needs of our customers who involve us in technical and strategic decisions. In addition to configuring and managing the tenant, we also take care of:

  • optimization of resource costs
  • implementation of scaling and high availability procedures
  • creation of application deployments through DevOps pipelines
  • monitoring
  • and, above all, security!

With Dev4Side, you have a reliable partner that supports you across the entire Microsoft application ecosystem.

Azure Synapse architecture

Synapse of Microsoft Azure is a platform that is part of the landscape of Online Analytical Processing (OLAP) applications, generally used to store and process large volumes of data collected from various sources, which can be transformed and/or modeled in the OLAP repository. Subsequently, large datasets are aggregated for ad hoc reports and analytical use cases.

As already mentioned in the introduction, Synapse is an evolution of Azure SQL Data Warehouse, a cloud-based relational database, with Massively Parallel Processing (or MPP, a form of data processing that involves the use of multiple processors to perform parallel computational tasks) and horizontal scalability, designed to process and store large volumes of data within the Microsoft Azure cloud platform.

The platform supports different languages such as SQL, Python, .NET, Java, Scala and R and supports two types of analytical runtimes, SQL and Spark, which can process data in batch, streaming and interactive mode and is also integrated with numerous other Azure data services, such as Azure Data Catalog, Azure Lake Storage, Azure Databricks, Azure HDInsight, Azure Machine Learning and Power BI.

To better understand its architecture, let's look at its fundamental pillars in a little more detail.

Azure Synapse Studio

Synapse Studio is a web-based SaaS tool that allows developers to work with every aspect of the platform from a single console. This is essentially our dashboard.

In the development cycle of an analytical solution using Synapse, you generally start by creating a workspace and launching this tool that provides access to the various Synapse functionalities such as importing data through import mechanisms or data pipelines, creating data flows, exploring data through notebooks, analyzing data with Spark jobs or SQL scripts, and finally visualizing data for reporting purposes and creating dashboards.

This tool also provides functionality for creating artifacts, debugging code, optimizing performance by analyzing metrics, integrating with CI/CD tools, and much more.

Overview of Azure Synapse Studio

Data integration tools

There are several tools that can be used to load data into Synapse. However, having an integrated orchestration engine helps reduce dependency and management of instances of separate tools and data pipelines.

Synapse includes an integrated orchestration engine identical to that of Azure Data Factory to create data pipelines and rich data transformation capabilities directly within the Synapse workspace itself.

Key features include support for more than 90 data sources, including nearly 15 Azure-based data sources, 26 open-source and cross-cloud data warehouses and databases, 6 file-based data sources, 3 NoSQL-based data sources, 28 services and apps that can act as data providers, and 4 generic protocols, including ODBC and REST. Pipelines can be created using integrated models from Synapse Studio to integrate data from various sources.

Synapse SQL Pools

Synapse SQL is Azure Synapse's T-SQL-based analysis engine, designed for high-performance manipulation of structured data. This functionality provides the same data warehousing characteristics that were available in previous versions of this service, where a fixed capacity of DWU units (Data Warehouse Units) is allocated to the instance of the data processing service.

What's new in Synapse is that this engine is now available both in the traditional Provisioned mode and in the new On-Demand mode.

Synapse has introduced a series of improvements to SQL pools starting with workload management capabilities, which allow users to refine the allocation of resources between them. There's also high-performance COPY functionality for loading data from external storage accounts. Finally, improvements such as the PREDICT clause integrate AI and machine learning, allowing the native evaluation of models directly within Transact-SQL.

SQL on demand is a noteworthy addition because it addresses an issue that, in the past, was an intrinsic compromise in the design of enterprise data systems. The reality of data ecosystems is that demand models vary significantly for a variety of reasons and it is necessary to make difficult architectural decisions based on how much computing capacity it is necessary to allocate to perform analysis and manage auxiliary tasks and where to place data within the architecture. This can lead to situations where management spending can become excessive if you miscalculate by excess and system malfunctions or erratic behavior if you do it by default. All scenarios in which no one would like to find themselves.

On-demand computing addresses these unpredictable workloads and provides another set of tools within the data architecture. Exploring the Data Lake, whether it's stored as Parquet, Orc or CSV, is now as easy as a right click.

SQL On-Demand also includes new improvements for ELT/Extract, Transform, Load tasks, with features such as delimited text parsers optimized for performance. The raw power and familiarity of SQL Server can be exploited when prototyping queries or performing other ad hoc tasks without having to estimate the anticipated load on the primary computation.

Components of the Azure Synapse SQL architecture

Apache Spark for Azure Synapse

The Apache Spark pools complete the list of Azure Synapse computing options with a powerful MPP engine designed for in-memory Big Data processing, ideal for semi-structured or unstructured workloads, typical of Internet of Things (IoT) and machine learning use cases.

The Synapse implementation is natively available from the Develop hub, where it is possible to create Notebooks directly using an advanced editor. Cognitive services and machine learning are also natively integrated. With a right click in the Data hub, populated by wizards intelligently configured by Linked Services and other configuration artifacts, you can create initial Notebooks that use these services.

Microsoft has put a lot of attention on how to increase productivity, regardless of whether the end user is a data scientist, a data engineer, or a simple business user. Synapse makes it easy to explore basic data through the creation of integrated graphs and aggregations. IntelliSense is integrated into all editors and you can use multiple languages within the same Notebook, including Python (PySpark), C#, Scala or Spark SQL.

Big Data Analysis with Azure Synapse

Azure Synapse: key usage benefits

The flexibility of Azure Synapse can help organizations of all types and sizes build a modern data landscape that allows insights based on data from all sources and can be scaled and customized for a multitude of different industries.

So let's see what are the main advantages that the service can offer to those who decide to exploit its enormous potential for data analysis and management:

  • Make the most of your data: whether it's an on-premises Data Warehouse or a Big Data system in the cloud, you can use Azure Synapse to integrate, connect and analyze structured and unstructured data from a wide range of sources. Synapse allows you to use machine learning models for reliable business forecasts and obtain new insights from streaming and IoT data in real time, allowing you to always rely on a consistent and updated database to best meet business requirements.
  • Low-code environment and intuitive UI for managing data pipelines: the availability of data flow functionality, an integrated orchestration engine and connectivity with more than 90 data sources, are among the many elements that work together to enable the development of data pipelines and data transformations through a low code environment and an accessible visual interface, allowing you to always keep your data-based processes under control and develop new “end-to-end” analytical solutions more quickly.
  • Improved access control and security: Microsoft has integrated a full suite of security and privacy provisions into Azure Synapse, and the platform adheres to nearly 30 certified compliance standards regarding security and data processing. Companies have access to features such as always-on data encryption, authentication through Azure Active Directory, automatic threat detection, and private linking. In addition, with comprehensive security options at the column and row levels, dynamic masking for real-time data protection, the discovery and classification of sensitive data, etc., allows the exchange of business data with sensitive information, always ensuring maximum protection.

Azure Synapse vs. Data Factory

The presence of numerous services dedicated to data management and analysis within the Microsoft Azure offer may cause some confusion among users who are approaching Redmond's cloud computing platform for the first time.

So let's start to shed some light by outlining the differences between Synapse and another of the most used services within the Azure platform, namely Data Factory, and let's try to understand in which context it is best to use one of the two.

Azure Synapse and Azure Data Factory are both essential components of Microsoft's data integration and analysis offerings, but they serve different purposes and address different use cases.

Synapse is a comprehensive analytics service that combines big data and data warehousing. It offers extensive data processing capabilities, supports various programming languages (such as SQL, Python and Spark), and integrates deeply with machine learning tools and business intelligence platforms such as Power BI. This makes it suitable for advanced analytics, real-time data transformations, and large scale data warehousing needs.

Azure Data Factory, on the other hand, is a data integration service focused on ETL processes (extraction, transformation, loading). It allows the creation of data pipelines through a graphical interface, supporting various data sources and destinations. It focuses primarily on code-free solutions for information transformation and integration; however, it doesn't offer the same breadth of analytical capabilities or programming flexibility as Azure Synapse.

Both services facilitate data integration and transformation, but Azure Synapse provides a more robust platform for complete analysis and data warehousing, exploiting various programming environments and deeper integration with machine learning, while Azure Data Factory, on the other hand, is designed for orchestrating data pipelines and simple and scalable ETL operations.

If you only want to connect and transform data without writing code, you should opt for Azure Data Factory. However, it doesn't allow you to customize data pipelines beyond its capabilities without code. If, on the other hand, you are looking for greater flexibility and control, Azure Synapse is the best choice.

Automated BI architecture based on Azure Synapse and Azure Data Factory

Azure Synapse pricing: costs and optimization

Azure Synapse offers different pricing models to adapt to different business needs.

The “Pay-as-You-Go” model is ideal for companies with variable or unpredictable workloads. With this model, companies pay only for the resources actually used, with no long-term commitments. This approach allows you to dynamically adapt computing power and storage based on real needs, thus reducing wasted resources and unnecessary costs.

The 'Serverless' model, on the other hand, is designed for sporadic or unpredictable workloads, where it is not necessary to maintain a constantly active infrastructure. With the serverless model, companies pay only for the queries executed, based on the amount of data processed. This approach is particularly advantageous for ad hoc analysis scenarios, where calculation requirements can vary significantly.

Finally, the “Reserved Capacity” model is designed for companies with predictable and constant workloads that need regular analysis. By subscribing to a reserved capacity for a period of one or three years, companies can benefit from significant discounts compared to the “Pay-as-You-Go” model. This model offers more predictable cost management and allows you to better plan your IT budget.

As far as variables are concerned, there are several factors to consider that can substantially affect the cost of the service, such as storage, data transfer, backups and monitoring.

The calculation (essentially the processing power), is charged based on the DWU. The higher the amount of DWU allocated, the greater the computing power and, consequently, the costs. The calculation is charged on an hourly basis, meaning that you only pay for the hours the data warehouse is active.

As far as storage is concerned, the costs are related to the volume of data stored and their replication. Azure Synapse charges for storage based on the total volume of data, rounded to the nearest terabyte. This includes not only the primary data but also seven days of incremental snapshots. For example, if a data warehouse contains 1.5 TB of data and has 100 GB of snapshots, the storage costs will be calculated on 2 TB.

The transfer of data involves costs for both entry and egress. Entry costs are generally free, but going out, especially if between different regions, can get expensive. If, for example, we wanted to transfer data from an Azure data center in Europe to one in the United States, this would entail additional costs that can increase rapidly if the volumes of data are high.

Backup and restore operations have variable costs depending on the frequency and duration of the backups you decide to make. The higher the frequency and the longer the period of time the backups are kept, the higher the associated costs will be.

For more detailed pricing information, Microsoft provides a convenient calculation tool (available hither), which allows you to calculate pricing options based on geographical region, currency, type of service and time of employment (calculable in hours or months).

Optimizing the costs of Azure Synapse

Given the amount of variables that can affect the price, it is important for companies to adopt strategies to optimize the use of the resources made available by the service and reduce the expense for their operations. So let's take a look at the best practices that organizations can use to take full advantage of the power and flexibility of Azure Synapse while keeping an eye on the portfolio.

One of the most effective techniques is to use the pause and resume features, which allow you to temporarily interrupt the data warehouse activity when it is not in use, saving on calculation costs. You can set up an automatic schedule that pauses the data warehouse during non-working hours and reactivates it during peak hours, ensuring that you only pay for the actual time of use.

Scaling resources based on workloads is another extremely useful technique, increasing computing capacity during periods of high activity and reducing it during periods of low activity. Azure Synapse allows you to dynamically scale Data Warehouse Units (DWU), allowing you to adapt resources to current needs and optimize costs.

The use of advanced data compression techniques is also an additional method of resource optimization that can significantly reduce the space needed to store data and, as a result, save significant amounts of money.

Finally, integrated tools such as Azure Synapse Studio and Azure Cost Management and Billing provide a detailed overview of query performance and associated costs, and their use can seriously make a difference to your budget.

The first offers real-time monitoring capabilities that allow us to observe resource usage and query performance, providing detailed insights to optimize the infrastructure. The second, on the other hand, allows you to monitor and manage expenses, identifying areas of overspending and providing recommendations to improve the cost/efficiency ratio.

Conclusions

With the ability to integrate data from different sources and to perform sophisticated analysis in real time, Synapse is positioned as a fundamental pillar in the Azure ecosystem and represents a solid solution for companies looking for an integrated and scalable platform for data analysis.

Synapse represents an important asset for organizations relatively new to the cloud because it can simplify the cloud adoption process and reduce much of the complexity associated with the traditional approach with siloed applications.

Even for intermediate and advanced organizations that have already invested in Data Lakes, built SQL Data Warehouses, or have significant assets in Power BI, the move to Synapse could be an investment worth considering. In general, the migration path is direct and its integration with Azure's most convincing capabilities, including machine learning and cognitive services, makes it an investment with great potential for expanding the functionality of its digital infrastructures.

FAQ on Azure Synapse

What is Azure Synapse?

Azure Synapse is an integrated data analytics service on the Microsoft Azure platform, designed to handle big data and data warehousing. It combines data integration, enterprise data warehousing, and big data analytics, offering tools like SQL, Spark, and integration with machine learning and BI tools.

What are the key features of Azure Synapse?

Azure Synapse offers features like Synapse Studio for unified management, data integration tools, SQL pools for structured data analysis, and Apache Spark for big data processing.

How does Azure Synapse compare to Azure Data Factory?

Azure Synapse is a comprehensive analytics platform, suitable for advanced data processing and big data analysis, while Azure Data Factory focuses on ETL processes and data pipeline creation with a code-free interface.

How is Azure Synapse priced?

Azure Synapse offers flexible pricing models like Pay-as-You-Go, Serverless, and Reserved Capacity, catering to different business needs and workloads. Costs depend on computing power, storage, and data transfer, with tools available for cost management.

How can Azure Synapse help optimize costs?

Organizations can optimize Azure Synapse costs by using features like pause and resume, scaling resources based on workloads, and advanced data compression. Monitoring tools like Synapse Studio help track and optimize query performance and expenses.

Find out why to choose the team

Infra & Sec

The Infra & Security team focuses on the management and evolution of our customers' Microsoft Azure tenants. Besides configuring and managing these tenants, the team is responsible for creating application deployments through DevOps pipelines. It also monitors and manages all security aspects of the tenants and supports Security Operations Centers (SOC).