Tag: Stream Analytics

CertificationData Sciencedatabase

70-776: Perform Big Data Engineering on Microsoft Cloud Services Certification Exam (20776)

The 70-776 Performing Big Data Engineering on Microsoft Cloud Services certification exam tests and validates your expertise in designing analytics solutions and building operationalized solutions on Microsoft Azure. This exam covers data engineering topics around Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics.

Exam Target Audience

The 70-776 Performing Big Data Engineering on Microsoft Cloud Services certification exam is targeted towards Big Data Professionals. This exam is centered around designing analytics solutions and building operationalized solutions on Microsoft Azure. The primary Azure service areas covered on this Big Data exam are: Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics.

Candidates with experience and familiarity with the capabilities and features of batch data processing, real-time processing, and operationalization technologies are the targeted audience for this exam. These candidates will be able to apply Microsoft cloud technologies to solution designs, and implement big data analytics solutions.

Skills Measured

Here is a list of the skills and objectives measured on this exam. The percentages on the high level objective areas represents the percentage of the exam that is focused on that objective area.

  • Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)
    • Ingest data for real-time processing
      • Select appropriate data ingestion technology based on specific constraints; design partitioning scheme and select mechanism for partitioning; ingest and process data from a Twitter stream; connect to stream processing entities; estimate throughput, latency needs, and job footprint; design reference data streams
    • Design and implement Azure Stream Analytics
      • Configure thresholds, use the Azure Machine Learning UDF, create alerts based on conditions, use a machine learning model for scoring, train a model for continuous learning, use common stream processing scenarios
    • Implement and manage the streaming pipeline
      • Stream data to a live dashboard, archive data as a storage artifact for batch processing, enable consistency between stream processing and batch processing logic
    • Query real-time data by using the Azure Stream Analytics query language
      • Use built-in functions, use data types, identify query language elements, control query windowing by using Time Management, guarantee event delivery
  • Design and Implement Analytics by Using Azure Data Lake (25-30%)
    • Ingest data into Azure Data Lake Store
      • Create an Azure Data Lake Store (ADLS) account, copy data to ADLS, secure data within ADLS by using access control, leverage end-user or service-to-service authentication appropriately, tune the performance of ADLS, access diagnostic logs
    • Manage Azure Data Lake Analytics
      • Create an Azure Data Lake Analytics (ADLA) account, manage users, manage data sources, manage, monitor, and troubleshoot jobs, access diagnostic logs, optimize jobs by using the vertex view, identify historical job information
    • Extract and transform data by using U-SQL
      • Schematize data on read at scale; generate outputter files; use the U-SQL data types, use C# and U-SQL expression language; identify major differences between T-SQL and U-SQL; perform JOINS, PIVOT, UNPIVOT, CROSS APPLY, and Windowing functions in U-SQL; share data and code through U-SQL catalog; define benefits and use of structured data in U-SQL; manage and secure the Catalog
    • Extend U-SQL programmability
      • Use user-defined functions, aggregators, and operators, scale out user-defined operators, call Python, R, and Cognitive capabilities, use U-SQL user-defined types, perform federated queries, share data and code across ADLA and ADLS
    • Integrate Azure Data Lake Analytics with other services
      • Integrate with Azure Data Factory, Azure HDInsight, Azure Data Catalog, and Azure Event Hubs, ingest data from Azure SQL Data Warehouse
  • Design and Implement Azure SQL Data Warehouse Solutions (15-20%)
    • Design tables in Azure SQL Data Warehouse
      • Choose the optimal type of distribution column to optimize workflows, select a table geometry, limit data skew and process skew through the appropriate selection of distributed columns, design columnstore indexes, identify when to scale compute nodes, calculate the number of distributions for a given workload
    • Query data in Azure SQL Data Warehouse
      • Implement query labels, aggregate functions, create and manage statistics in distributed tables, monitor user queries to identify performance issues, change a user resource class
    • Integrate Azure SQL Data Warehouse with other services
      • Ingest data into Azure SQL Data Warehouse by using AZCopy, Polybase, Bulk Copy Program (BCP), Azure Data Factory, SQL Server Integration Services (SSIS), Create-Table-As-Select (CTAS), and Create-External-Table-As-Select (CETAS); export data from Azure SQL Data Warehouse; provide connection information to access Azure SQL Data Warehouse from Azure Machine Learning; leverage Polybase to access a different distributed store; migrate data to Azure SQL Data Warehouse; select the appropriate ingestion method based on business needs
  • Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)
    • Implement datasets and linked services
      • Implement availability for the slice, create dataset policies, configure the appropriate linked service based on the activity and the dataset
    • Move, transform, and analyze data by using Azure Data Factory activities
      • Copy data between on-premises and the cloud, create different activity types, extend the data factory by using custom processing steps, move data to and from Azure SQL Data Warehouse
    • Orchestrate data processing by using Azure Data Factory pipelines
      • Identify data dependencies and chain multiple activities, model schedules based on data dependencies, provision and run data pipelines, design a data flow
    • Monitor and manage Azure Data Factory
      • Identify failures and root causes, create alerts for specified conditions, perform a redeploy, use the Microsoft Azure Portal monitoring tool
  • Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)
    • Provision Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics
      • Provision Azure SQL Data Warehouse, Azure Data Lake, and Azure Data Factory, implement Azure Stream Analytics
    • Implement authentication, authorization, and auditing
      • Integrate services with Azure Active Directory (Azure AD), use the local security model in Azure SQL Data Warehouse, configure firewalls, implement auditing, integrate services with Azure Data Factory
    • Manage data recovery for Azure SQL Data Warehouse, Azure Data Lake, and Azure Data Factory, Azure Stream Analytics
      • Backup and recover services, plan and implement geo-redundancy for Azure Storage, migrate from an on-premises data warehouse to Azure SQL Data Warehouse
    • Monitor Azure SQL Data Warehouse, Azure Data Lake, and Azure Stream Analytics
      • Manage concurrency, manage elastic scale for Azure SQL Data Warehouse, monitor workloads by using Dynamic Management Views (DMVs) for Azure SQL Data Warehouse, troubleshoot Azure Data Lake performance by using the Vertex Execution View
    • Design and implement storage solutions for big data implementations
      • Optimize storage to meet performance needs, select appropriate storage types based on business requirements, use AZCopy, Storage Explorer and Redgate Azure Explorer to migrate data, design cloud solutions that integrate with on-premises data

You can also view the full objectives list for the 70-776 Performing Big Data Engineering on Microsoft Cloud Services certification exam on the official 70-776 exam page at Microsoft.com.

Training Material

There are not very many study / training materials designed specifically for the 70-776 Perform Big Data Engineering on Microsoft Cloud Services certification exam. As example, Microsoft Press has not published any Exam Ref or Guide books for this exam yet (at the time of writing this.) However, there is still plenty of material available from various sources, both Free and Paid, that range from official product documentation from Microsoft, other articles and videos from Microsoft, as well as training from companies like Opsgility and their SkillMeUp service.

Another interesting resource to utilize is the following recording of the 70-776 Cert Exam Prep session given by James Herring at Microsoft Ignite 2017:

Happy Studying!!

Internet of Things

Deciding PaaS or SaaS for Building IoT Solutions in Microsoft Azure

Building out an IoT (Internet of Things) solution can be a difficult problem to solve. It sounds easy at first, you just connect a bunch of devices, sensors and such to the cloud. You write software to run on the IoT hardware and in the cloud, then connect the two to gather data / telemetry, communicate, and interoperate. Sounds easy, right? Well, it’s actually not as simple as it sounds. There are many things that can be difficult to implement correctly. The biggest problem area is Security, as it is in most other systems types as well. Then you can device management, cloud vs edge analytics, and many other aspects to a full IoT solution.

Traditionally you would need to build all this out yourself, however, with offerings from Microsoft there are a few options available for building out IoT solutions. The Azure IoT Suite offers PaaS (Platform as a Service) capabilities that are flexible for any scenario, while the newer Microsoft IoT Central is offering more managed SaaS (Software as a Service) capabilities to further assist in easing development, deployment and management.

PaaS IoT with Azure IoT Suite

There are many Microsoft Azure cloud services that can be used to build out an IoT solution. In order to more easily choose which services, Microsoft has created a marketing umbrella called the “Azure IoT Suite” that includes the following core services:

  • Azure IoT Hub provides 2-way device messaging to the cloud with full device management and security integration among other IoT features.
  • Azure Notification Hubs enables the ability to implement mobile push notification easily within the cloud that supports all major mobile platforms from iOS to Android and Windows.
  • Azure Machine Learning provides the ability to build powerful cloud-based predictive analytics tools using pre-built machine learning algorithms that greatly lower the barrier to embracing machine learning for your solutions.
  • PowerBI allows for rich visuals to be displayed providing easier analysis and reporting on your data.
  • Azure Stream Analytics is a Real-Time event stream processing pipeline in the cloud thats built for high scale and ease of integration.

In addition to the listed services, you could really use any other Azure service that fits your particular solution. For example you may integrate Azure Storage, Azure CosmosDB, Azure Functions, among many others to build out the full capabilities of your own IoT solutions. It’s really up to you to choose what Azure services fit your scenario best and build out the best solution for your needs.

The Azure IoT Suite is based on using Azure PaaS (Platform as a Service) offerings to build out your solutions in a manner where you don’t need to manage any of the underlying Virtual Machine, Operating System updates / patches, and so on. These underlying VM in the PaaS services are fully managed for you within Microsoft Azure. This allows you to focus on your solution, your business, and your data; essentials you only focus on what matters to your core business in building out your IoT solutions.

SaaS IoT with Microsoft IoT Central

With the announcement of Microsoft IoT Central, Microsoft is entering into an area of offering a SaaS (Software as a Service) offering for building out and managing IoT (Internet of Things) solutions. This mean that not only do you benefit from the managed VMs and other aspects of the Azure IoT Suite PaaS offering, but you will also benefit from a great level of abstraction and managed services built / designed specifically for IoT form the ground up.

I speculate that Microsoft IoT Central is in fact running on top of Azure IoT Suite at it’s core; this is the pattern Microsoft operates with when adding higher levels of abstract in the Azure cloud. Similarly, Azure Functions provides serverless compute and execution of method of code in the cloud, and is built as an abstraction layer on top of the Azure Web Jobs PaaS feature of Azure App Service.

The further abstraction of Microsoft IoT Central creates a SaaS (Software as a Service) offering from Microsoft for more easily implementing and managing IoT solutions using a SaaS model. This is great for organizations that do not have much cloud solution and device expertise. It also helps those organizations build IoT solutions that offer more predictable pricing without the necessity to completely build the entire IoT solution themselves.

Choosing PaaS or SaaS for Your IoT Solution

Choosing PaaS (Platform as a Service) or SaaS (Software as a Service) is a choice that’s similar to the options of hosting a traditional application using either IaaS or PaaS. It’s really a comparable analogy. When deciding which of them to choose, here are some highlights of each option that you can use to help decide between a SaaS-based IoT solution or a PaaS-based IoT solution:

SaaS-based IoT Solution

  • Fully managed solution
  • Less flexibility – you will need to use the pre-built or builtin features to build out your IoT solution
  • More features builtin – You don’t have to build everything yourself, as there are more features builtin that you can “automatically” take advantage of
  • Lower barrier to entry

PaaS-based IoT Solution

  • Fully customizable solution
  • More flexible – you can implement pretty much any IoT solution you need
  • Implement more yourself – With more flexibility, comes an increased responsibility to implement more of the various features of your IoT solutions yourself
  • More expertise required

Looking at the previous highlights of PaaS vs SaaS based IoT solutions, it really does appear that SaaS is the better option. This really may be the case. Coming back to the IaaS vs PaaS analogy for hosting application, you want to start with the more managed service and then go more customizable if you need the flexibility. The same thing goes for IoT solutions as well. You’ll want to evaluate the SaaS based services that Microsoft IoT Central offer you before starting to build out your IoT solution. If SaaS offers you everything, then the more managed system will likely be best for you to use. However, if there is anything you require than SaaS (via Microsoft IoT Central) doesn’t support, and you really truly do require that feature in your solution, then you’ll likely want to go the PaaS route with Azure IoT Suite to build your own custom implementation.

I hope the outline provided in this article helps you decide whether SaaS-based or PaaS-based framework and services are the most appropriate choice for your organizations next IoT solution.

Internet of Things

MkeAzure Slides: Getting Started with IoT using Azure, Windows IoT and Raspberry Pi

On Aug 17, 2016 I gave a “Getting Started with IoT” talk at the Milwaukee Azure group. In my talk I covered the basics of IoT Messaging Architecture and Azure IoT Suite (specifically IoT Hub and Stream Analytics), along with other Azure services such as Service Bus Queus, DocumentDB, and Azure Functions. No, IoT solution is complete with out an actual hardware device, so I showed what’s necessary to get started with Windows IoT development on a Raspberry Pi 2/3 along with an Adafruit BME280 Temp/Humidity/Pressure sensor and an LED wired up to the device. Read More