7 essential Mage features for data engineers in 2024

First published on October 10, 2024

 

12 minute read

Cole Freeman

TLDR

Explore how Mage can help developers build flexible and scalable data pipelines. This article guides you through features that will help data engineers streamline their workflow design, enhance data processing, and ensure reliable data pipeline execution. By leveraging Mage Pro’s advanced tools, data engineers can simplify complex processes and achieve greater productivity in their projects.

Outline

  • Introduction

  • Dynamic blocks

  • Conditional blocks

  • Sensor blocks

  • SQL blocks

  • Streaming pipelines

  • Global data products

  • Global hooks

  • Conclusion

Introduction

Data engineering continues to evolve causing developers to experience an array of old and new problems. Mage is a comprehensive data engineering platform that provides many solutions to help data engineers create pipelines rather than worry about managing infrastructure, handling complex integrations, and ensuring scalability. See the 7 essential features listed below and start developing pipelines.

Dynamic blocks

 in Mage are a special type of block that can create multiple downstream blocks at runtime. This feature allows for incredible flexibility in pipeline design. It enables data engineers to create workflows that adapt to the data they’re processing. The power of dynamic blocks lies in their ability to generate a variable number of blocks based on the output of an upstream block. This means your pipeline can scale and adjust itself depending on the data it receives, without requiring manual intervention or redesign. Dynamic blocks run in parallel, reducing the processing time and improving the efficiency of your data pipelines.

How dynamic blocks work

Let’s break down the mechanics of dynamic blocks:

  1. Output Structure

    : A dynamic block must return a list of lists. The first list contains the data that is passed on to downstream blocks. While the second list holds metadata for each dynamically created block.

  2. Downstream Block Creation

    : The number of downstream blocks created is equal to the number of items in the output data multiplied by the number of direct downstream blocks.

  3. Data Flow

    : Each dynamically created block receives a portion of the data from the upstream dynamic block. This allows for parallel processing of different data subsets.

  4. Metadata

    : The metadata provided by the dynamic block is used to uniquely identify each dynamically created block. This ensures proper data routing by the pipeline.

Parallel processing for retail companies

If you are a large retail company looking to process your transactional data, based on payment type, in parallel, developers on your team can split large datasets into smaller chunks, using dynamic blocks for parallel processing, then recombine the results downstream for analytics.

Conditional blocks

 in Mage are add-on blocks that attach to main pipeline blocks where a condition will be evaluated before the block executes. Their primary function is to perform critical checks before a block executes, ensuring that predefined conditions are met. This functionality enables dynamic and responsive data routing through different paths in the pipeline. Conditional blocks enhance pipeline efficiency by allowing for granular control over data processing. This allows developers to create more nuanced, flexible, and robust compliance pipelines that can adapt to varying regulatory scenarios and institutional risk appetites.

Source: GIPHY

How conditional blocks work

What happens when we set conditions:

  • When data meets specified conditions, it flows seamlessly through branch of the pipeline where the condition is met.

  • Conversely, if data do not satisfy the set conditions, they will flow through the other branch of the pipeline

Conditional pipeline logic for compliance professionals

Conditional blocks may come in handy when segmenting data into different workflows for banks tracking suspicious activity reporting thresholds (SAR). Developers could set a condition that transactions meeting SAR thresholds move through one side of the pipeline, while others moved through the other data pipeline.

Sensor blocks

 in Mage is a specialized feature that continuously monitors specific conditions within a data pipeline. Unlike regular blocks that execute based on a predefined schedule or trigger, sensor blocks remain active, evaluating conditions until they are satisfied or a set period elapses. This continuous evaluation ensures that dependent blocks execute only when the necessary prerequisites are met, enhancing the reliability and efficiency of the entire pipeline.

How sensor blocks work

What happens when we configure a sensor block:

  • Continuous Evaluation:

     Sensors persistently check for specified conditions, ensuring timely execution of dependent blocks.

  • Conditional Execution:

     Downstream blocks wait for sensors to validate conditions before initiating, preventing premature or redundant runs.

  • Time-bound Monitoring:

     Sensors can be configured to cease evaluation after a certain timeframe, balancing responsiveness with resource usage.

  • Integration with External Pipelines:

     Sensors can monitor the status of external pipelines or specific blocks within them, facilitating cross-pipeline dependencies.

Automated workflows for analytics teams

Analytics teams can establish workflows to email critical stakeholders the most up-to-date reports, ensuring they receive the freshest data available. Developers can implement sensor blocks that automatically trigger these workflows as soon as the data pipeline detects new data generation.

SQL blocks

Mage 

 are a robust feature enabling low-code integration with your data warehouse, while also offering the flexibility to custom code all your SQL requirements. At its core, Mage SQL Blocks are designed to simplify and optimize SQL-based data operations. They offer a unique blend of flexibility, automation, and integration capabilities that set them apart from traditional SQL execution environments.

How SQL blocks work

When configuring a SQL block in Mage Pro, the following key features collaborate to enhance and streamline your SQL-based data operations:

  • Flexible Write Policies:

     Mage Pro SQL Blocks offer append, replace, and fail write policies, allowing precise control over how new data interacts with existing tables. This flexibility ensures data integrity and prevents unintended overwrites or duplications.

  • Automatic Table Creation:

     Automatically generates tables in your chosen data storage provider based on your SQL definitions. This eliminates the need for manual table setup, saving time and reducing the risk of schema errors.

  • Raw SQL Execution:

     Provides the option to execute raw SQL commands directly within a SQL block. This feature grants full control over SQL statements, enabling complex database operations and optimizations tailored to specific requirements.

  • Variable Interpolation:

     Allows the use of variables (e.g., 

    {{ df_1 }}

    {{ df_2 }}

    ) to reference data from upstream blocks directly within SQL queries. This simplifies data integration and enhances the dynamism of your SQL operations.

  • Multiple Statement Support:

     Enables the execution of multiple SQL statements within a single SQL block, separated by semicolons. This capability streamlines complex operations by consolidating related tasks, improving readability and maintainability of data pipelines.

Centralized data warehousing

A large retail company aims to streamline its sales data integration from multiple sources, including online platforms, physical stores, and third-party marketplaces. They can use SQL blocks to centralize data in the storage system for analytics and reporting.

Streaming pipelines

 are data integration pipelines that connect to streaming sources such as Kafka, Azure Events Hub, Google pub/sub, and more. These pipelines enable data engineers to extract, transform, and load real time data into from multiple platforms. Three components make up a streaming pipeline: sources, transformers and sinks.

How streaming blocks work

When configuring a streaming pipeline in Mage developers should do the following:

  • Source configuration:

     Developers can connect to their streaming sources through the yaml file templates available in Mage streaming pipeline data loader blocks.

  • Transformers

    : Developers can then connect transformer blocks to the data loader block and transform their data using Python, SQL, or R.

  • Sink (destination) configuration

    : Transformed data can then be routed to the sync through Mage streaming sources data exporter blocks.

Automated anomaly detection

Financial institutions can leverage streaming pipelines to ingest live transaction data from Kafka, transform the data using Python scripts for anomaly detection, and route the processed information to their analytics dashboard.

Global data products

A data product is any piece of data generated by one or more components within a data pipeline. This could be anything from an in-memory DataFrame, a JSON structure, to a table in a database. Essentially, it’s the end result of your data processing steps that is ready for consumption.

 elevates this concept by making the data product accessible across the entire project. It is registered under a unique identifier (UUID) and references an existing pipeline. This global accessibility means that any pipeline within the project can reference and utilize the data product without needing to regenerate it.

Source: GIPHY

How global data products work

When configuring a global data product in Mage Pro, the following key features collaborate to optimize and streamline your data workflows:

  • Unique Identifier Registration:

     Each global data product is registered under a unique UUID, ensuring it can be distinctly identified and accessed across all pipelines within the project.

  • Reusability Across Pipelines:

     Global data products allow multiple pipelines to access and utilize the same data outputs without the need for redundant computations, enhancing efficiency and consistency.

  • **Lazy Triggering Mechanism:**Data products are generated only when required. If no pipeline depends on a data product, it remains inactive until a downstream process requests it, optimizing resource usage.

  • Configurable Outdated Settings:

     Settings such as “Outdated after” and “Outdated starting at” determine the validity period of a data product, controlling when it should be regenerated to ensure data freshness.

  • Seamless Integration with Pipelines:

     Integrating global data products into pipelines is straightforward, allowing pipelines to easily reference and interact with registered data products through dedicated blocks.

Optimize supply chain management

Manufacturing companies can use global data products to centralize inventory and supplier information, making it accessible across production, procurement, and logistics pipelines, allowing all departments to access the current data and reducing operational costs.

Global hooks

 allow developers to execute custom code at specific points during the application’s execution cycle. These hooks can be used to perform various operations such as data validation, transformation, or integration with external systems. Global Hooks help developers automate repetitive tasks or enforce certain business rules across multiple components of your application.

How global hooks work

Global Hooks in Mage can be triggered at two different points during the execution of a pipeline:

  1. Pre-completion of a Block

    : The hook will run before a specific block in the pipeline is executed. This allows developers to perform operations like data validation, transformation, or enrichment before the block processes the data.

  2. Post-completion of a Block

    : The hook will run after a specific block in the pipeline has completed execution. This enables developers to perform actions based on the block’s output, to include: integration with external systems, logging, auditing, or triggering downstream processes.

By leveraging these two execution points, Global Hooks provide a flexible way to extend the functionality of your data pipelines. You can choose to execute custom code either before or after a block’s execution, depending on your specific requirements.

Automated email notifications for e-commerce

An e-commerce company uses global hooks to automate sending a confirmation email to customers whenever they place a new order. By executing the custom code after the order is processed they can assure each customer is automatically notified of their order in a timely manner.

Conclusion

Mastering these essential Mage features will allow developers to transform their data engineering practices. From providing features to dynamically scale your pipelines, to simplifying streaming connections through their YAML file templates, Mage offers a suite of features that are unrivaled in the industry. If you want to learn more about Mage and it’s additional capabilities check out their 

 documentation and build your first pipeline today.