azure.ai.ml.automl package Azure SDK for Python 2.0.0 documentation

class azure.ai.ml.automl.ClassificationModels(value)[source]

Enum for all classification models supported by AutoML.

BERNOULLI_NAIVE_BAYES = 'BernoulliNaiveBayes'

Naive Bayes classifier for multivariate Bernoulli models.

DECISION_TREE = 'DecisionTree'

Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

EXTREME_RANDOM_TREES = 'ExtremeRandomTrees'

Extreme Trees is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is related to the widely used random forest algorithm.

GRADIENT_BOOSTING = 'GradientBoosting'

The technique of transiting week learners into a strong learner is called Boosting. The gradient boosting algorithm process works on this theory of execution.

KNN = 'KNN'

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set.

LIGHT_GBM = 'LightGBM'

LightGBM is a gradient boosting framework that uses tree based learning algorithms.

LINEAR_SVM = 'LinearSVM'

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text. Linear SVM performs best when input data is linear, i.e., data can be easily classified by drawing the straight line between classified values on a plotted graph.

LOGISTIC_REGRESSION = 'LogisticRegression'

Logistic regression is a fundamental classification technique. It belongs to the group of linear classifiers and is somewhat similar to polynomial and linear regression. Logistic regression is fast and relatively uncomplicated, and it’s convenient for you to interpret the results. Although it’s essentially a method for binary classification, it can also be applied to multiclass problems.

MULTINOMIAL_NAIVE_BAYES = 'MultinomialNaiveBayes'

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

RANDOM_FOREST = 'RandomForest'

Random forest is a supervised learning algorithm. The “forest” it builds, is an ensemble of decision trees, usually trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result.

SGD = 'SGD'

Stochastic gradient descent is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs.

Type

SGD

SVM = 'SVM'

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.

XG_BOOST_CLASSIFIER = 'XGBoostClassifier'

Extreme Gradient Boosting Algorithm. This algorithm is used for structured data where target column values can be divided into distinct class values.

Type

XGBoost

class azure.ai.ml.automl.ClassificationMultilabelPrimaryMetrics(value)[source]

Primary metrics for classification multilabel tasks.

ACCURACY = 'Accuracy'

Accuracy is the ratio of predictions that exactly match the true class labels.

AUC_WEIGHTED = 'AUCWeighted'

AUC is the Area under the curve. This metric represents arithmetic mean of the score for each class, weighted by the number of true instances in each class.

AVERAGE_PRECISION_SCORE_WEIGHTED = 'AveragePrecisionScoreWeighted'

The arithmetic mean of the average precision score for each class, weighted by the number of true instances in each class.

IOU = 'IOU'

Intersection Over Union. Intersection of predictions divided by union of predictions.

NORM_MACRO_RECALL = 'NormMacroRecall'

Normalized macro recall is recall macro-averaged and normalized, so that random performance has a score of 0, and perfect performance has a score of 1.

PRECISION_SCORE_WEIGHTED = 'PrecisionScoreWeighted'

The arithmetic mean of precision for each class, weighted by number of true instances in each class.

class azure.ai.ml.automl.ClassificationPrimaryMetrics(value)[source]

Primary metrics for classification tasks.

ACCURACY = 'Accuracy'

Accuracy is the ratio of predictions that exactly match the true class labels.

AUC_WEIGHTED = 'AUCWeighted'

AUC is the Area under the curve. This metric represents arithmetic mean of the score for each class, weighted by the number of true instances in each class.

AVERAGE_PRECISION_SCORE_WEIGHTED = 'AveragePrecisionScoreWeighted'

The arithmetic mean of the average precision score for each class, weighted by the number of true instances in each class.

NORM_MACRO_RECALL = 'NormMacroRecall'

Normalized macro recall is recall macro-averaged and normalized, so that random performance has a score of 0, and perfect performance has a score of 1.

PRECISION_SCORE_WEIGHTED = 'PrecisionScoreWeighted'

The arithmetic mean of precision for each class, weighted by number of true instances in each class.

class azure.ai.ml.automl.ColumnTransformer(*, fields: Optional[List[str]] = None, parameters: Optional[Dict[str, Union[str, float]]] = None, **kwargs)[source]

Column transformer settings.

Parameters
  • fields – The fields on which to perform custom featurization

  • parameters (Dict[str, Optional[str, float]]) – parameters used for custom featurization

class azure.ai.ml.automl.FeaturizationMode(value)[source]

Featurization mode - determines data featurization mode.

AUTO = 'Auto'

Auto mode, system performs featurization without any custom featurization inputs.

CUSTOM = 'Custom'

Custom featurization.

OFF = 'Off'

Featurization off. ‘Forecasting’ task cannot use this value.

class azure.ai.ml.automl.ForecastHorizonMode(value)[source]

Enum to determine forecast horizon selection mode.

AUTO = 'Auto'

Forecast horizon to be determined automatically.

CUSTOM = 'Custom'

Use the custom forecast horizon.

class azure.ai.ml.automl.ForecastingModels(value)[source]

Enum for all forecasting models supported by AutoML.

ARIMAX = 'Arimax'

An Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) model can be viewed as a multiple regression model with one or more autoregressive (AR) terms and/or one or more moving average (MA) terms. This method is suitable for forecasting when data is stationary/non stationary, and multivariate with any type of data pattern, i.e., level/trend /seasonality/cyclicity.

AUTO_ARIMA = 'AutoArima'

Auto-Autoregressive Integrated Moving Average (ARIMA) model uses time-series data and statistical analysis to interpret the data and make future predictions. This model aims to explain data by using time series data on its past values and uses linear regression to make predictions.

AVERAGE = 'Average'

The Average forecasting model makes predictions by carrying forward the average of the target values for each time-series in the training data.

DECISION_TREE = 'DecisionTree'

Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

ELASTIC_NET = 'ElasticNet'

Elastic net is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 and L2 penalty functions.

EXPONENTIAL_SMOOTHING = 'ExponentialSmoothing'

Exponential smoothing is a time series forecasting method for univariate data that can be extended to support data with a systematic trend or seasonal component.

EXTREME_RANDOM_TREES = 'ExtremeRandomTrees'

Extreme Trees is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is related to the widely used random forest algorithm.

GRADIENT_BOOSTING = 'GradientBoosting'

The technique of transiting week learners into a strong learner is called Boosting. The gradient boosting algorithm process works on this theory of execution.

KNN = 'KNN'

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set.

LASSO_LARS = 'LassoLars'

Lasso model fit with Least Angle Regression a.k.a. Lars. It is a Linear Model trained with an L1 prior as regularizer.

LIGHT_GBM = 'LightGBM'

LightGBM is a gradient boosting framework that uses tree based learning algorithms.

NAIVE = 'Naive'

The Naive forecasting model makes predictions by carrying forward the latest target value for each time-series in the training data.

PROPHET = 'Prophet'

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

RANDOM_FOREST = 'RandomForest'

Random forest is a supervised learning algorithm. The “forest”\ it builds, is an ensemble of decision trees, usually trained with the “bagging”\ method. The general idea of the bagging method is that a combination of learning models increases the overall result.

SEASONAL_AVERAGE = 'SeasonalAverage'

The Seasonal Average forecasting model makes predictions by carrying forward the average value of the latest season of data for each time-series in the training data.

SEASONAL_NAIVE = 'SeasonalNaive'

The Seasonal Naive forecasting model makes predictions by carrying forward the latest season of target values for each time-series in the training data.

SGD = 'SGD'

Stochastic gradient descent is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It’s an inexact but powerful technique.

Type

SGD

TCN_FORECASTER = 'TCNForecaster'

Ask forecasting team for brief intro.

Type

TCNForecaster

Type

Temporal Convolutional Networks Forecaster. //TODO

XG_BOOST_REGRESSOR = 'XGBoostRegressor'

Extreme Gradient Boosting Regressor is a supervised machine learning model using ensemble of base learners.

Type

XGBoostRegressor

class azure.ai.ml.automl.ForecastingPrimaryMetrics(value)[source]

Primary metrics for Forecasting task.

NORMALIZED_MEAN_ABSOLUTE_ERROR = 'NormalizedMeanAbsoluteError'

The Normalized Mean Absolute Error (NMAE) is a validation metric to compare the Mean Absolute Error (MAE) of (time) series with different scales.

NORMALIZED_ROOT_MEAN_SQUARED_ERROR = 'NormalizedRootMeanSquaredError'

The Normalized Root Mean Squared Error (NRMSE) the RMSE facilitates the comparison between models with different scales.

R2_SCORE = 'R2Score'

The R2 score is one of the performance evaluation measures for forecasting-based machine learning models.

SPEARMAN_CORRELATION = 'SpearmanCorrelation'

The Spearman’s rank coefficient of correlation is a non-parametric measure of rank correlation.

class azure.ai.ml.automl.ForecastingSettings(*, country_or_region_for_holidays: Optional[str] = None, cv_step_size: Optional[int] = None, forecast_horizon: Optional[Union[str, int]] = None, target_lags: Optional[Union[str, int, List[int]]] = None, target_rolling_window_size: Optional[Union[str, int]] = None, frequency: Optional[str] = None, feature_lags: Optional[str] = None, seasonality: Optional[Union[str, int]] = None, use_stl: Optional[str] = None, short_series_handling_config: Optional[str] = None, target_aggregate_function: Optional[str] = None, time_column_name: Optional[str] = None, time_series_id_column_names: Optional[Union[str, List[str]]] = None)[source]

Forecasting settings for an AutoML Job.

Parameters
  • country_or_region_for_holidays (str) – The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region code, for example ‘US’ or ‘GB’.

  • forecast_horizon (int) – The desired maximum forecast horizon in units of time-series frequency.

  • target_lags (Union[str, int, List[int]]) – The number of past periods to lag from the target column. Use ‘auto’ to use the automatic heuristic based lag.

  • target_rolling_window_size (int) – The number of past periods used to create a rolling window average of the target column.

  • frequency (str) – Forecast frequency. When forecasting, this parameter represents the period with which the forecast is desired, for example daily, weekly, yearly, etc.

  • feature_lags (str) – Flag for generating lags for the numeric features with ‘auto’

  • seasonality (Union[str, int]) – Set time series seasonality as an integer multiple of the series frequency. Use ‘auto’ for automatic settings.

  • use_stl (str) – Configure STL Decomposition of the time-series target column. use_stl can take two values: ‘season’ - only generate season component and ‘season_trend’ - generate both season and trend components.

  • short_series_handling_config (str) – The parameter defining how if AutoML should handle short time series.

  • target_aggregate_function (str) – The function to be used to aggregate the time series target column to conform to a user specified frequency. If the target_aggregation_function is set, but the freq parameter is not set, the error is raised. The possible target aggregation functions are: “sum”, “max”, “min” and “mean”.

  • time_column_name (str) – The name of the time column.

  • time_series_id_column_names (Union[str, List[str]]) – The names of columns used to group a timeseries.

class azure.ai.ml.automl.ImageClassificationSearchSpace(*, ams_gradient: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, augmentations: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, beta1: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, beta2: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, distributed: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping_delay: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping_patience: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, enable_onnx_normalization: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, evaluation_frequency: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, gradient_accumulation_step: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, layers_to_freeze: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, learning_rate: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, learning_rate_scheduler: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, model_name: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, momentum: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, nesterov: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, number_of_epochs: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, number_of_workers: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, optimizer: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, random_seed: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, step_lr_gamma: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, step_lr_step_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, training_batch_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_batch_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, warmup_cosine_lr_cycles: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, warmup_cosine_lr_warmup_epochs: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, weight_decay: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, training_crop_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_crop_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_resize_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, weighted_loss: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None)[source]

Search space for AutoML Image Classification and Image Classification Multilabel tasks.

Parameters
  • ams_gradient (str or SweepDistribution) – Enable AMSGrad when optimizer is ‘adam’ or ‘adamw’.

  • augmentations (str or SweepDistribution) – Settings for using Augmentations.

  • beta1 (float or SweepDistribution) – Value of ‘beta1’ when optimizer is ‘adam’ or ‘adamw’. Must be a float in the range [0, 1].

  • beta2 (float or SweepDistribution) – Value of ‘beta2’ when optimizer is ‘adam’ or ‘adamw’. Must be a float in the range [0, 1].

  • distributed (bool or SweepDistribution) – Whether to use distributer training.

  • early_stopping (bool or SweepDistribution) – Enable early stopping logic during training.

  • early_stopping_delay (int or SweepDistribution) – Minimum number of epochs or validation evaluations to wait before primary metric improvement is tracked for early stopping. Must be a positive integer.

  • early_stopping_patience (int or SweepDistribution) – Minimum number of epochs or validation evaluations with no primary metric improvement before the run is stopped. Must be a positive integer.

  • enable_onnx_normalization (bool or SweepDistribution) – Enable normalization when exporting ONNX model.

  • evaluation_frequency (int or SweepDistribution) – Frequency to evaluate validation dataset to get metric scores. Must be a positive integer.

  • gradient_accumulation_step (int or SweepDistribution) – Gradient accumulation means running a configured number of “GradAccumulationStep” steps without updating the model weights while accumulating the gradients of those steps, and then using the accumulated gradients to compute the weight updates. Must be a positive integer.

  • layers_to_freeze (int or SweepDistribution) – Number of layers to freeze for the model. Must be a positive integer. For instance, passing 2 as value for ‘seresnext’ means freezing layer0 and layer1. For a full list of models supported and details on layer freeze, please see: https://docs.microsoft.com/en-us/azure/machine-learning/reference-automl-images-hyperparameters#model-agnostic-hyperparameters.

  • learning_rate (float or SweepDistribution) – Initial learning rate. Must be a float in the range [0, 1].

  • learning_rate_scheduler (str or SweepDistribution) – Type of learning rate scheduler. Must be ‘warmup_cosine’ or ‘step’.

  • model_name (str or SweepDistribution) – Name of the model to use for training. For more information on the available models please visit the official documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models.

  • momentum (float or SweepDistribution) – Value of momentum when optimizer is ‘sgd’. Must be a float in the range [0, 1].

  • nesterov (bool or SweepDistribution) – Enable nesterov when optimizer is ‘sgd’.

  • number_of_epochs (int or SweepDistribution) – Number of training epochs. Must be a positive integer.

  • number_of_workers (int or SweepDistribution) – Number of data loader workers. Must be a non-negative integer.

  • optimizer (str or SweepDistribution) – Type of optimizer. Must be either ‘sgd’, ‘adam’, or ‘adamw’.

  • random_seed (int or SweepDistribution) – Random seed to be used when using deterministic training.

  • step_lr_gamma (float or SweepDistribution) – Value of gamma when learning rate scheduler is ‘step’. Must be a float in the range [0, 1].

  • step_lr_step_size (int or SweepDistribution) – Value of step size when learning rate scheduler is ‘step’. Must be a positive integer.

  • training_batch_size (int or SweepDistribution) – Training batch size. Must be a positive integer.

  • validation_batch_size (str or SweepDistribution) – Validation batch size. Must be a positive integer.

  • warmup_cosine_lr_cycles (float or SweepDistribution) – Value of cosine cycle when learning rate scheduler is ‘warmup_cosine’. Must be a float in the range [0, 1].

  • warmup_cosine_lr_warmup_epochs (int or SweepDistribution) – Value of warmup epochs when learning rate scheduler is ‘warmup_cosine’. Must be a positive integer.

  • weight_decay (float or SweepDistribution) – Value of weight decay when optimizer is ‘sgd’, ‘adam’, or ‘adamw’. Must be a float in the range[0, 1].

  • training_crop_size (int or SweepDistribution) – Image crop size that is input to the neural network for the training dataset. Must be a positive integer.

  • validation_crop_size (int or SweepDistribution) – Image crop size that is input to the neural network for the validation dataset. Must be a positive integer.

  • validation_resize_size (int or SweepDistribution) – Image size to which to resize before cropping for validation dataset. Must be a positive integer.

  • weighted_loss (int or SweepDistribution) – Weighted loss. The accepted values are 0 for no weighted loss. 1 for weighted loss with sqrt.(class_weights). 2 for weighted loss with class_weights. Must be 0 or 1 or 2.

class azure.ai.ml.automl.ImageLimitSettings(*, max_concurrent_trials: Optional[int] = None, max_trials: Optional[int] = None, timeout_minutes: Optional[int] = None)[source]

Limit settings for all AutoML Image Verticals.

Parameters
  • max_concurrent_trials (int) – Maximum number of concurrent AutoML iterations.

  • max_trials (int) – Maximum number of AutoML iterations.

  • timeout (timedelta) – AutoML job timeout.

class azure.ai.ml.automl.ImageObjectDetectionSearchSpace(*, ams_gradient: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, augmentations: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, beta1: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, beta2: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, distributed: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping_delay: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, early_stopping_patience: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, enable_onnx_normalization: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, evaluation_frequency: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, gradient_accumulation_step: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, layers_to_freeze: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, learning_rate: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, learning_rate_scheduler: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, model_name: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, momentum: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, nesterov: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, number_of_epochs: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, number_of_workers: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, optimizer: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, random_seed: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, step_lr_gamma: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, step_lr_step_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, training_batch_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_batch_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, warmup_cosine_lr_cycles: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, warmup_cosine_lr_warmup_epochs: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, weight_decay: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, box_detections_per_image: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, box_score_threshold: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, image_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, max_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, min_size: Optional[Union[int, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, model_size: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, multi_scale: Optional[Union[bool, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, nms_iou_threshold: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, tile_grid_size: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, tile_overlap_ratio: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, tile_predictions_nms_threshold: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_iou_threshold: Optional[Union[float, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None, validation_metric_type: Optional[Union[str, azure.ai.ml.entities._job.sweep.search_space.SweepDistribution]] = None)[source]

Search space for AutoML Image Object Detection and Image Instance Segmentation tasks.

Parameters
  • ams_gradient (bool or SweepDistribution) – Enable AMSGrad when optimizer is ‘adam’ or ‘adamw’.

  • augmentations (str or SweepDistribution) – Settings for using Augmentations.

  • beta1 (float or SweepDistribution) – Value of ‘beta1’ when optimizer is ‘adam’ or ‘adamw’. Must be a float in the range [0, 1].

  • beta2 (float or SweepDistribution) – Value of ‘beta2’ when optimizer is ‘adam’ or ‘adamw’. Must be a float in the range [0, 1].

  • distributed (bool or SweepDistribution) – Whether to use distributer training.

  • early_stopping (bool or SweepDistribution) – Enable early stopping logic during training.

  • early_stopping_delay (int or SweepDistribution) – Minimum number of epochs or validation evaluations to wait before primary metric improvement is tracked for early stopping. Must be a positive integer.

  • early_stopping_patience (int or SweepDistribution) – Minimum number of epochs or validation evaluations with no primary metric improvement before the run is stopped. Must be a positive integer.

  • enable_onnx_normalization (bool or SweepDistribution) – Enable normalization when exporting ONNX model.

  • evaluation_frequency (int or SweepDistribution) – Frequency to evaluate validation dataset to get metric scores. Must be a positive integer.

  • gradient_accumulation_step (int or SweepDistribution) – Gradient accumulation means running a configured number of “GradAccumulationStep” steps without updating the model weights while accumulating the gradients of those steps, and then using the accumulated gradients to compute the weight updates. Must be a positive integer.

  • layers_to_freeze (int or SweepDistribution) – Number of layers to freeze for the model. Must be a positive integer. For instance, passing 2 as value for ‘seresnext’ means freezing layer0 and layer1. For a full list of models supported and details on layer freeze, please see: https://docs.microsoft.com/en-us/azure/machine-learning/reference-automl-images-hyperparameters#model-agnostic-hyperparameters.

  • learning_rate (float or SweepDistribution) – Initial learning rate. Must be a float in the range [0, 1].

  • learning_rate_scheduler (str or SweepDistribution) – Type of learning rate scheduler. Must be ‘warmup_cosine’ or ‘step’.

  • model_name (str or SweepDistribution) – Name of the model to use for training. For more information on the available models please visit the official documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models.

  • momentum (float or SweepDistribution) – Value of momentum when optimizer is ‘sgd’. Must be a float in the range [0, 1].

  • nesterov (bool or SweepDistribution) – Enable nesterov when optimizer is ‘sgd’.

  • number_of_epochs (int or SweepDistribution) – Number of training epochs. Must be a positive integer.

  • number_of_workers (int or SweepDistribution) – Number of data loader workers. Must be a non-negative integer.

  • optimizer (str or SweepDistribution) – Type of optimizer. Must be either ‘sgd’, ‘adam’, or ‘adamw’.

  • random_seed (int or SweepDistribution) – Random seed to be used when using deterministic training.

  • step_lr_gamma (float or SweepDistribution) – Value of gamma when learning rate scheduler is ‘step’. Must be a float in the range [0, 1].

  • step_lr_step_size (int or SweepDistribution) – Value of step size when learning rate scheduler is ‘step’. Must be a positive integer.

  • training_batch_size (int or SweepDistribution) – Training batch size. Must be a positive integer.

  • validation_batch_size (int or SweepDistribution) – Validation batch size. Must be a positive integer.

  • warmup_cosine_lr_cycles (float or SweepDistribution) – Value of cosine cycle when learning rate scheduler is ‘warmup_cosine’. Must be a float in the range [0, 1].

  • warmup_cosine_lr_warmup_epochs (int or SweepDistribution) – Value of warmup epochs when learning rate scheduler is ‘warmup_cosine’. Must be a positive integer.

  • weight_decay (int or SweepDistribution) – Value of weight decay when optimizer is ‘sgd’, ‘adam’, or ‘adamw’. Must be a float in the range[0, 1].

  • box_detections_per_image (int or SweepDistribution) – Maximum number of detections per image, for all classes. Must be a positive integer. Note: This settings is not supported for the ‘yolov5’ algorithm.

  • box_score_threshold (float or SweepDistribution) – During inference, only return proposals with a classification score greater than BoxScoreThreshold. Must be a float in the range[0, 1].

  • image_size (int or SweepDistribution) – Image size for train and validation. Must be a positive integer. Note: The training run may get into CUDA OOM if the size is too big. Note: This settings is only supported for the ‘yolov5’ algorithm.

  • max_size (int or SweepDistribution) – Maximum size of the image to be rescaled before feeding it to the backbone. Must be a positive integer. Note: training run may get into CUDA OOM if the size is too big. Note: This settings is not supported for the ‘yolov5’ algorithm.

  • min_size (int or SweepDistribution) – Minimum size of the image to be rescaled before feeding it to the backbone. Must be a positive integer. Note: training run may get into CUDA OOM if the size is too big. Note: This settings is not supported for the ‘yolov5’ algorithm.

  • model_size (str or SweepDistribution) – Model size. Must be ‘small’, ‘medium’, ‘large’, or ‘extra_large’. Note: training run may get into CUDA OOM if the model size is too big. Note: This settings is only supported for the ‘yolov5’ algorithm.

  • multi_scale (bool or SweepDistribution) – Enable multi-scale image by varying image size by +/- 50%. Note: training run may get into CUDA OOM if no sufficient GPU memory. Note: This settings is only supported for the ‘yolov5’ algorithm.

  • nms_iou_threshold (float or SweepDistribution) – IOU threshold used during inference in NMS post processing. Must be float in the range [0, 1].

  • tile_grid_size (str or SweepDistribution) – The grid size to use for tiling each image. Note: TileGridSize must not be None to enable small object detection logic. A string containing two integers in mxn format.

  • tile_overlap_ratio (float or SweepDistribution) – Overlap ratio between adjacent tiles in each dimension. Must be float in the range [0, 1).

  • tile_predictions_nms_threshold (float or SweepDistribution) – The IOU threshold to use to perform NMS while merging predictions from tiles and image. Used in validation/ inference. Must be float in the range [0, 1]. NMS: Non-maximum suppression.

  • validation_iou_threshold (float or SweepDistribution) – IOU threshold to use when computing validation metric. Must be float in the range [0, 1].

  • validation_metric_type (str or SweepDistribution) – Metric computation method to use for validation metrics. Must be ‘none’, ‘coco’, ‘voc’, or ‘coco_voc’.

class azure.ai.ml.automl.ImageSweepSettings(*, sampling_algorithm: Union[str, azure.ai.ml._restclient.v2022_02_01_preview.models._azure_machine_learning_workspaces_enums.SamplingAlgorithmType], max_concurrent_trials: Optional[int] = None, max_trials: Optional[int] = None, early_termination: Optional[azure.ai.ml.entities._job.sweep.early_termination_policy.EarlyTerminationPolicy] = None)[source]

Sweep settings for all AutoML Image Verticals.

Parameters
  • sampling_algorithm (str or SamplingAlgorithmType) – Required. [Required] Type of the hyperparameter sampling algorithms. Possible values include: “Grid”, “Random”, “Bayesian”.

  • max_concurrent_trials (int) – Maximum Concurrent iterations.

  • max_trials (int) – Number of iterations.

  • early_termination (EarlyTerminationPolicy) – Type of early termination policy.

class azure.ai.ml.automl.InstanceSegmentationPrimaryMetrics(value)[source]

Primary metrics for InstanceSegmentation tasks.

MEAN_AVERAGE_PRECISION = 'MeanAveragePrecision'

Mean Average Precision (MAP) is the average of AP (Average Precision). AP is calculated for each class and averaged to get the MAP.

class azure.ai.ml.automl.NCrossValidationsMode(value)[source]

Determines how N-Cross validations value is determined.

AUTO = 'Auto'

Determine N-Cross validations value automatically. Supported only for ‘Forecasting’ AutoML task.

CUSTOM = 'Custom'

Use custom N-Cross validations value.

class azure.ai.ml.automl.NlpFeaturizationSettings(*, dataset_language: Optional[str] = None)[source]

Featurization settings for all AutoML NLP Verticals.

class azure.ai.ml.automl.NlpLimitSettings(*, max_concurrent_trials: Optional[int] = None, max_trials: int = 1, timeout_minutes: Optional[int] = None)[source]

Limit settings for all AutoML NLP Verticals.

class azure.ai.ml.automl.ObjectDetectionPrimaryMetrics(value)[source]

Primary metrics for Image ObjectDetection task.

MEAN_AVERAGE_PRECISION = 'MeanAveragePrecision'

Mean Average Precision (MAP) is the average of AP (Average Precision). AP is calculated for each class and averaged to get the MAP.

class azure.ai.ml.automl.RegressionModels(value)[source]

Enum for all Regression models supported by AutoML.

DECISION_TREE = 'DecisionTree'

Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

ELASTIC_NET = 'ElasticNet'

Elastic net is a popular type of regularized linear regression that combines two popular penalties, specifically the L1 and L2 penalty functions.

EXTREME_RANDOM_TREES = 'ExtremeRandomTrees'

Extreme Trees is an ensemble machine learning algorithm that combines the predictions from many decision trees. It is related to the widely used random forest algorithm.

GRADIENT_BOOSTING = 'GradientBoosting'

The technique of transiting week learners into a strong learner is called Boosting. The gradient boosting algorithm process works on this theory of execution.

KNN = 'KNN'

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set.

LASSO_LARS = 'LassoLars'

Lasso model fit with Least Angle Regression a.k.a. Lars. It is a Linear Model trained with an L1 prior as regularizer.

LIGHT_GBM = 'LightGBM'

LightGBM is a gradient boosting framework that uses tree based learning algorithms.

RANDOM_FOREST = 'RandomForest'

Random forest is a supervised learning algorithm. The “forest”\ it builds, is an ensemble of decision trees, usually trained with the “bagging”\ method. The general idea of the bagging method is that a combination of learning models increases the overall result.

SGD = 'SGD'

Stochastic gradient descent is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It’s an inexact but powerful technique.

Type

SGD

XG_BOOST_REGRESSOR = 'XGBoostRegressor'

Extreme Gradient Boosting Regressor is a supervised machine learning model using ensemble of base learners.

Type

XGBoostRegressor

class azure.ai.ml.automl.RegressionPrimaryMetrics(value)[source]

Primary metrics for Regression task.

NORMALIZED_MEAN_ABSOLUTE_ERROR = 'NormalizedMeanAbsoluteError'

The Normalized Mean Absolute Error (NMAE) is a validation metric to compare the Mean Absolute Error (MAE) of (time) series with different scales.

NORMALIZED_ROOT_MEAN_SQUARED_ERROR = 'NormalizedRootMeanSquaredError'

The Normalized Root Mean Squared Error (NRMSE) the RMSE facilitates the comparison between models with different scales.

R2_SCORE = 'R2Score'

The R2 score is one of the performance evaluation measures for forecasting-based machine learning models.

SPEARMAN_CORRELATION = 'SpearmanCorrelation'

The Spearman’s rank coefficient of correlation is a nonparametric measure of rank correlation.

class azure.ai.ml.automl.ShortSeriesHandlingConfiguration(value)[source]

The parameter defining how if AutoML should handle short time series.

AUTO = 'Auto'

Short series will be padded if there are no long series, otherwise short series will be dropped.

DROP = 'Drop'

All the short series will be dropped.

NONE = 'None'

Represents no/null value.

PAD = 'Pad'

All the short series will be padded.

class azure.ai.ml.automl.TabularFeaturizationSettings(*, blocked_transformers: Optional[List[str]] = None, column_name_and_types: Optional[Dict[str, str]] = None, dataset_language: Optional[str] = None, transformer_params: Optional[Dict[str, List[azure.ai.ml.entities._job.automl.tabular.featurization_settings.ColumnTransformer]]] = None, mode: Optional[str] = None, enable_dnn_featurization: Optional[bool] = None)[source]

Featurization settings for an AutoML Job.

Parameters
  • blocked_transformers (List[str]) – A list of transformers to ignore when featurizing.

  • column_name_and_types (Dict[str, str]) – A dictionary of column names and feature types used to update column purpose.

  • dataset_language (str) – The language of the dataset.

  • transformer_params (Dict[str, List[ColumnTransformer]]) – A dictionary of transformers and their parameters.

  • mode (str) – The mode of the featurization.

  • enable_dnn_featurization (bool) – Whether to enable DNN featurization.

class azure.ai.ml.automl.TabularLimitSettings(*, enable_early_termination: Optional[bool] = None, exit_score: Optional[float] = None, max_concurrent_trials: Optional[int] = None, max_cores_per_trial: Optional[int] = None, max_trials: Optional[int] = None, timeout_minutes: Optional[int] = None, trial_timeout_minutes: Optional[int] = None)[source]

Limit settings for a AutoML Table Verticals.

class azure.ai.ml.automl.TargetAggregationFunction(value)[source]

Target aggregate function.

MAX = 'Max' MEAN = 'Mean' MIN = 'Min' NONE = 'None'

Represent no value set.

SUM = 'Sum' class azure.ai.ml.automl.TargetLagsMode(value)[source]

Target lags selection modes.

AUTO = 'Auto'

Target lags to be determined automatically.

CUSTOM = 'Custom'

Use the custom target lags.

class azure.ai.ml.automl.TargetRollingWindowSizeMode(value)[source]

Target rolling windows size mode.

AUTO = 'Auto'

Determine rolling windows size automatically.

CUSTOM = 'Custom'

Use the specified rolling window size.

class azure.ai.ml.automl.UseStl(value)[source]

Configure STL Decomposition of the time-series target column.

NONE = 'None'

No stl decomposition.

SEASON = 'Season' SEASON_TREND = 'SeasonTrend' azure.ai.ml.automl.classification(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: str = None, enable_model_explainability: bool = True, weight_column_name: str = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, n_cross_validations: Union[str, int] = None, cv_split_column_names: List[str] = None, test_data: azure.ai.ml.entities._inputs_outputs.Input = None, test_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.tabular.classification_job.ClassificationJob[source]

Function to create a ClassificationJob.

A classification job is used to train a model that best predict the class of a data sample. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data, validation_data and test_data parameters

  • primary_metric (str, optional) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, and precision_score_weighted Defaults to accuracy

  • enable_model_explainability (bool, optional) –

    Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is True. For more information, see Interpretability: model explanations in automated machine learning.

    Defaults to True

  • weight_column_name (str, optional) –

    The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn’t have column names, column indices can be used instead, expressed as integers.

    This parameter is applicable to training_data and validation_data parameters

  • validation_data (Input, optional) –

    The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column).

    Defaults to None

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • n_cross_validations (Union[str, int], optional) –

    How many cross validations to perform when user validation data is not specified.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • cv_split_column_names (List[str], optional) –

    List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation.

    Defaults to None

  • test_data (Input, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    If this parameter or the test_data_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the target_column_name parameter must be specified.

    Defaults to None

  • test_data_size (float, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    This should be between 0.0 and 1.0 non-inclusive. If test_data_size is specified at the same time as validation_data_size, then the test data is split from training_data before the validation data is split. For example, if validation_data_size=0.1, test_data_size=0.1 and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows.

    For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split.

    If this parameter or the test_data parameter are not specified then no test run will be executed automatically after model training is completed.

    Defaults to None

Returns

A job object that can be submitted to an Azure ML compute for execution.

Return type

ClassificationJob

azure.ai.ml.automl.forecasting(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: str = None, enable_model_explainability: bool = True, weight_column_name: str = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, n_cross_validations: Union[str, int] = None, cv_split_column_names: List[str] = None, test_data: azure.ai.ml.entities._inputs_outputs.Input = None, test_data_size: float = None, forecasting_settings: azure.ai.ml.entities._job.automl.tabular.forecasting_settings.ForecastingSettings = None, **kwargs)azure.ai.ml.entities._job.automl.tabular.forecasting_job.ForecastingJob[source]

Function to create a Forecasting job.

A forecasting task is used to predict target values for a future time period based on the historical data. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data, validation_data and test_data parameters

  • primary_metric (str, optional) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: r2_score, normalized_mean_absolute_error, normalized_root_mean_squared_error Defaults to normalized_root_mean_squared_error

  • enable_model_explainability (bool, optional) –

    Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is True. For more information, see Interpretability: model explanations in automated machine learning.

    Defaults to True

  • weight_column_name (str, optional) –

    The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn’t have column names, column indices can be used instead, expressed as integers.

    This parameter is applicable to training_data and validation_data parameters

  • validation_data (Input, optional) –

    The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column).

    Defaults to None

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • n_cross_validations (Union[str, int], optional) –

    How many cross validations to perform when user validation data is not specified.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • cv_split_column_names (List[str], optional) –

    List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation.

    Defaults to None

  • test_data (Input, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    If this parameter or the test_data_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the target_column_name parameter must be specified.

    Defaults to None

  • test_data_size (float, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    This should be between 0.0 and 1.0 non-inclusive. If test_data_size is specified at the same time as validation_data_size, then the test data is split from training_data before the validation data is split. For example, if validation_data_size=0.1, test_data_size=0.1 and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows.

    For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split.

    If this parameter or the test_data parameter are not specified then no test run will be executed automatically after model training is completed.

    Defaults to None

  • forecasting_settings (ForecastingSettings, optional) – The settings for the forecasting task

Returns

A job object that can be submitted to an Azure ML compute for execution.

Return type

ForecastingJob

azure.ai.ml.automl.image_classification(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: Union[str, azure.ai.ml._restclient.v2022_02_01_preview.models._azure_machine_learning_workspaces_enums.ClassificationPrimaryMetrics] = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.image.image_classification_job.ImageClassificationJob[source]

Creates an object for AutoML Image multi-class Classification job.

Parameters
  • training_data (Input) – The training data to be used within the experiment.

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data and validation_data parameters.

  • primary_metric (Union[str, ClassificationPrimaryMetrics]) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, and precision_score_weighted Defaults to accuracy.

  • validation_data (Input, optional) – The validation data to be used within the experiment.

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set validation_data_size to extract validation data out of the specified training data.

    Defaults to .2

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

Image classification job object that can be submitted to an Azure ML compute for execution.

Return type

ImageClassificationJob

azure.ai.ml.automl.image_classification_multilabel(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: Union[str, azure.ai.ml._restclient.v2022_02_01_preview.models._azure_machine_learning_workspaces_enums.ClassificationMultilabelPrimaryMetrics] = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.image.image_classification_multilabel_job.ImageClassificationMultilabelJob[source]

Creates an object for AutoML Image multi-label Classification job.

Parameters
  • training_data (Input) – The training data to be used within the experiment.

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data and validation_data parameters.

  • primary_metric (Union[str, ClassificationMultilabelPrimaryMetrics]) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: accuracy, AUC_weighted, norm_macro_recall, average_precision_score_weighted, precision_score_weighted, and Iou Defaults to Iou.

  • validation_data (Input, optional) – The validation data to be used within the experiment.

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set validation_data_size to extract validation data out of the specified training data.

    Defaults to .2

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

Image multi-label classification job object that can be submitted to an Azure ML compute for execution.

Return type

ImageClassificationMultilabelJob

azure.ai.ml.automl.image_instance_segmentation(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: Union[str, azure.ai.ml._restclient.v2022_02_01_preview.models._azure_machine_learning_workspaces_enums.InstanceSegmentationPrimaryMetrics] = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.image.image_instance_segmentation_job.ImageInstanceSegmentationJob[source]

Creates an object for AutoML Image Instance Segmentation job.

Parameters
  • training_data (Input) – The training data to be used within the experiment.

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data and validation_data parameters.

  • primary_metric (Union[str, InstanceSegmentationPrimaryMetrics]) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: MeanAveragePrecision Defaults to MeanAveragePrecision.

  • validation_data (Input, optional) – The validation data to be used within the experiment.

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set validation_data_size to extract validation data out of the specified training data.

    Defaults to .2

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

Image instance segmentation job

Return type

ImageInstanceSegmentationJob

azure.ai.ml.automl.image_object_detection(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: Union[str, azure.ai.ml._restclient.v2022_02_01_preview.models._azure_machine_learning_workspaces_enums.ObjectDetectionPrimaryMetrics] = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.image.image_object_detection_job.ImageObjectDetectionJob[source]

Creates an object for AutoML Image Object Detection job.

Parameters
  • training_data (Input) – The training data to be used within the experiment.

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data and validation_data parameters.

  • primary_metric (Union[str, ObjectDetectionPrimaryMetrics]) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: MeanAveragePrecision Defaults to MeanAveragePrecision.

  • validation_data (Input, optional) – The validation data to be used within the experiment.

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set validation_data_size to extract validation data out of the specified training data.

    Defaults to .2

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

Image object detection job object that can be submitted to an Azure ML compute for execution.

Return type

ImageObjectDetectionJob

azure.ai.ml.automl.regression(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, primary_metric: str = None, enable_model_explainability: bool = True, weight_column_name: str = None, validation_data: azure.ai.ml.entities._inputs_outputs.Input = None, validation_data_size: float = None, n_cross_validations: Union[str, int] = None, cv_split_column_names: List[str] = None, test_data: azure.ai.ml.entities._inputs_outputs.Input = None, test_data_size: float = None, **kwargs)azure.ai.ml.entities._job.automl.tabular.regression_job.RegressionJob[source]

Function to create a Regression Job.

A regression job is used to train a model to predict continuous values of a target variable from a dataset. Various models are trained using the training data. The model with the best performance on the validation data based on the primary metric is selected as the final model.

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a label column (optionally a sample weights column).

  • target_column_name (str) – The name of the label column. This parameter is applicable to training_data, validation_data and test_data parameters

  • primary_metric (str, optional) –

    The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. For more information on how metrics are calculated, see https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#primary-metric.

    Acceptable values: spearman_correlation, r2_score, normalized_mean_absolute_error, normalized_root_mean_squared_error. Defaults to normalized_root_mean_squared_error

  • enable_model_explainability (bool, optional) –

    Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. The default is True. For more information, see Interpretability: model explanations in automated machine learning.

    Defaults to True

  • weight_column_name (str, optional) –

    The name of the sample weight column. Automated ML supports a weighted column as an input, causing rows in the data to be weighted up or down. If the input data is from a pandas.DataFrame which doesn’t have column names, column indices can be used instead, expressed as integers.

    This parameter is applicable to training_data and validation_data parameters

  • validation_data (Input, optional) –

    The validation data to be used within the experiment. It should contain both training features and label column (optionally a sample weights column).

    Defaults to None

  • validation_data_size (float, optional) –

    What fraction of the data to hold out for validation when user validation data is not specified. This should be between 0.0 and 1.0 non-inclusive.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • n_cross_validations (Union[str, int], optional) –

    How many cross validations to perform when user validation data is not specified.

    Specify validation_data to provide validation data, otherwise set n_cross_validations or validation_data_size to extract validation data out of the specified training data. For custom cross validation fold, use cv_split_column_names.

    For more information, see Configure data splits and cross-validation in automated machine learning.

    Defaults to None

  • cv_split_column_names (List[str], optional) –

    List of names of the columns that contain custom cross validation split. Each of the CV split columns represents one CV split where each row are either marked 1 for training or 0 for validation.

    Defaults to None

  • test_data (Input, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. The test data to be used for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    If this parameter or the test_data_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the target_column_name parameter must be specified.

    Defaults to None

  • test_data_size (float, optional) –

    The Model Test feature using test datasets or test data splits is a feature in Preview state and might change at any time. What fraction of the training data to hold out for test data for a test run that will automatically be started after model training is complete. The test run will get predictions using the best model and will compute metrics given these predictions.

    This should be between 0.0 and 1.0 non-inclusive. If test_data_size is specified at the same time as validation_data_size, then the test data is split from training_data before the validation data is split. For example, if validation_data_size=0.1, test_data_size=0.1 and the original training data has 1000 rows, then the test data will have 100 rows, the validation data will contain 90 rows and the training data will have 810 rows.

    For regression based tasks, random sampling is used. For classification tasks, stratified sampling is used. Forecasting does not currently support specifying a test dataset using a train/test split.

    If this parameter or the test_data parameter are not specified then no test run will be executed automatically after model training is completed.

    Defaults to None

Returns

A job object that can be submitted to an Azure ML compute for execution.

Return type

RegressionJob

azure.ai.ml.automl.text_classification(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, validation_data: azure.ai.ml.entities._inputs_outputs.Input, primary_metric: Optional[str] = None, log_verbosity: Optional[str] = None, **kwargs)azure.ai.ml.entities._job.automl.nlp.text_classification_job.TextClassificationJob[source]

Function to create a TextClassificationJob.

A text classification job is used to train a model that can predict the class/category of a text data. Input training data should include a target column that classifies the text into exactly one class.

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a target column.

  • target_column_name (str) – Name of the target column.

  • validation_data (Input) – The validation data to be used within the experiment. It should contain both training features and a target column.

  • primary_metric (Union[str, ClassificationPrimaryMetrics]) – Primary metric for the task. Acceptable values: accuracy, AUC_weighted, precision_score_weighted

  • log_verbosity (str) – Log verbosity level.

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

The TextClassificationJob object.

azure.ai.ml.automl.text_classification_multilabel(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, target_column_name: str, validation_data: azure.ai.ml.entities._inputs_outputs.Input, primary_metric: Optional[str] = None, log_verbosity: Optional[str] = None, **kwargs)azure.ai.ml.entities._job.automl.nlp.text_classification_multilabel_job.TextClassificationMultilabelJob[source]

Function to create a TextClassificationMultilabelJob.

A text classification multilabel job is used to train a model that can predict the classes/categories of a text data. Input training data should include a target column that classifies the text into class(es). For more information on format of multilabel data, refer to: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-nlp-models#multi-label

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a target column.

  • target_column_name (str) – Name of the target column.

  • validation_data (Input) – The validation data to be used within the experiment. It should contain both training features and a target column.

  • primary_metric (str) – Primary metric for the task. Acceptable values: accuracy

  • log_verbosity (str) – Log verbosity level.

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

The TextClassificationMultilabelJob object.

azure.ai.ml.automl.text_ner(*, training_data: azure.ai.ml.entities._inputs_outputs.Input, validation_data: azure.ai.ml.entities._inputs_outputs.Input, primary_metric: Optional[str] = None, log_verbosity: Optional[str] = None, **kwargs)azure.ai.ml.entities._job.automl.nlp.text_ner_job.TextNerJob[source]

Function to create a TextNerJob.

A text named entity recognition job is used to train a model that can predict the named entities in the text. Input training data should be a text file in CoNLL format. For more information on format of text NER data, refer to: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-nlp-models#named-entity-recognition-ner

Parameters
  • training_data (Input) – The training data to be used within the experiment. It should contain both training features and a target column.

  • validation_data (Input) – The validation data to be used within the experiment. It should contain both training features and a target column.

  • primary_metric (str) – Primary metric for the task. Acceptable values: accuracy

  • log_verbosity (str) – Log verbosity level.

  • kwargs (dict) – A dictionary of additional configuration parameters.

Returns

The TextNerJob object.

ncG1vNJzZmiZqqq%2Fpr%2FDpJuom6Njr627wWeaqKqVY8SqusOorqxmnprBcHDWnploqKmptbC6jpqxrqqVYq6qecylZmlmYWN9o4KOmrGuqpVjrqp6zKVlmq2kpLqtesetpKU%3D