– Advertisement –
[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers], (You can report content on this page here) Want to share your content on R-Bloggers? Click here if you have a blog, or click here if you don’t have one.
– Advertisement –
In a series of Azure Machine Learning posts:
– Advertisement –
After creating an Azure Machine Learning workspace, you will not only be able to start Studio but also access all the necessary settings and information.
Outlook for Azure Machine Learning Workspaces
– Advertisement –
In this overview page, you can click the “Launch Studio” button in the middle of the workspace or you can copy and paste the Studio web URL provided under “Required” to start Studio.
But before we launch Studio, let’s look at a few additional settings that deserve mention.
Access Control (IAM) – Here you will be able to view and check security access and access to resources. You can also check the access level of a user, group, service principal, or managed identity to this resource (this applies if you have the service administrator role).
Events – Will help you to attach Azure Functions, Webhooks, Storage Queues and many more when the event occurs. This would be a great way to attach additional actions to events such as Model Registration, Model Deployment, Dataset Drift Detected, and more.
Example of event for data flow to storage queue
Networking – when you need to allow access to this resource via the Internet. Using private endpoint connections or public access.
Properties – is a list of all resources with their IDs. This will become very useful when establishing connections to other Azure services or other clients. Here you will find “Resource ID”, “Storage Account ID”, “Key Vault ID”, “Container Registry ID” and others.
Tasks (Preview) – Part of the automation process where tasks can be created and scheduled with automated actions – eg: scheduling emails for monthly billing overview. This will create a logic app and you can also create your own logic apps and use them in tasks.
Going back to “Overview”, now click on “Launch Studio” and you will be redirected to the standalone site. You will find the start page of the studio after the workspace we created the other day – mine is “AML_Blogpost2022”.
Start page of Azure Machine Learning Studio.
On the left top corner you will find an arrow with “Default Directory”. If you click on it, you will get a general overview (default directory) of Azure Machine Learning which is bound to your subscription (!).
Default directory of Azure Machine Learning for a given subscription
In the default directory, you will be able to create new workspaces and open existing workspaces. You will also be able to access registries, which will allow ML assets such as components, environments, and models to be shared and used across multiple teams and workspaces in your organization. When creating a new workspace for the first time, azureml will be automatically generated.
Azureml brings together different components and environments in different regions as part of the shared assets. Under components, you can find different definitions for the designer (low-code machine learning solution builder) and different preset environments and frameworks (such as PyTorch, TensorFlow, sklearn,Responsible AI and others).
Now that you have an understanding of the default directories and registries, let’s go back to Studio.
Azure Machine Learning Studio
You’ll always have navigation available to you in Studio (regardless of whether shown or hidden).
The navigation bar provides you with three sections, Authors, Assets and Management. Each section provides different resources and each can also be managed by different users.
Under the Authors section, you will find:
Notebooks – Create Python notebooks (ipynb), run Python (*.py), R (*.R) scripts, Bash (*.sh) scripts, and more. The notebook interface is similar to JupyterLab, with access to all working files, folders, terminal, and executable components. If you are running R, Python, Bash, YAML scripts, these will be executed in the terminal. Automated ML – is a complete no-code offering for “black-box” automated model training with the creation of automated ML jobs. You can select a classification or regression problem, time-series problem, natural language processing, or computer-vision (multi-class or multi-label image classification, object detection, or instance segmentation) problem. Designer – is a low-code environment with drag-and-drop pre-built components for easy building of end-to-end machine learning solutions. R and Python are both supported.
Under property section:
Data – is an essential asset. You can access data using Data Asset, Datastore and Dataset. Data assets represent a reference to a data source location with a copy of its metadata. These contexts can be Azure ML Datastores, Azure Storage (wasbs, abfss, adl, https), local files, public URLs (public https). A datastore is a secure connection to a storage service on Azure by storing connection information so that you don’t need credential information in your scripts to access your data. Datastore provides connections to Azure Blob, Azure Data Lake Gen2, Azure Files, Azure Data Lake Gen1. Jobs – Contains a list of all experiment runs and all scheduled jobs. Components – are the basic building blocks for performing a specific machine learning task, such as model training, and model scoring with predefined input/output ports, parameters, and environment that can be reused across different pipelines. Pipelines – Create pipeline jobs and pipeline endpoints for environments – is a list of curated environments with installed frameworks (PyTorch, Tensorflow, AzureML,…) and backed by cached Docker images for better performance and lower costs. models – contains a list of registered prediction models. Here you can also register (import) a model from a local file, job output or datastore. Endpoints – List of real-time and batch endpoints, which allows you to deploy machine learning models as a web service and perform batch inference on large amounts of data.
And under the Manage section:
Compute – Create a Compute instance (with installed applications of JupyterLab, Jupyter, RStudio, VS Code, and Terminal). Here you can also create compute clusters and inference clusters (you create an Azure Kubernetes Service (AKS) cluster and attach your workspace, then deploy your models as REST endpoints) and use the attached compute . Linked Services – Integrate Azure Machine Learning with other Azure services (Azure Synapse) and manage linked assets. You can also attach spark pools and use spark notebooks. Data Labeling – Create a labeling project for image classification (multi-class or multi-label), object identification (bounding box), or instance segmentation (polygon). Images for data labeling projects must be available in Azure Blob Datastore.
Now that we have a basic understanding of studios and properties, we’ll need to get the data into our workspace. Tomorrow, we’ll explore ways to get the data.
The competing set of code, documentation, notebooks and all materials will be available in the Github repository: https://github.com/tomaztk/Azure-Machine-Learning
Happy arrival of 2022!
– Advertisement –