Key Machine Learning Terms and Concepts Cont'd
Other Common Machine Learning terms
Algorithm: A self-contained set of rules used to solve problems through data processing, math, or automated reasoning.
Anomaly Detection: A model that flags unusual events or values and helps you discover problems. For example, credit card fraud detection looks for unusual purchases.
Categorical Data: Data that is organized by categories and that can be divided into groups. For example, a categorical data set for autos could specify year, make, model, and price.
Classification: A model for organizing data points into categories based on a data set for which category groupings are already known.
Feature Engineering: The process of extracting or selecting features related to a data set in order to enhance the data set and improve outcomes. For instance, airfare data could be enhanced by days of the week and holidays. See Feature selection and engineering in Azure Machine Learning.
Module: A functional part in a Machine Learning Studio model, such as Metadata editor module that enables data transformation in data sets. An algorithm is also a type of module in Machine Learning Studio.
Model: A supervised learning model is the product of a machine learning experiment comprised of training data, an algorithm module, and functional modules, such as a Score Model module.
Numerical data: Data that has meaning as measurements (continuous data) or counts (discrete data). Also referred to as quantitative data.
Partition: The method by which you divide data into samples. See Partition and Sample for more information.
Prediction: A prediction is a forecast of a value or values from a machine learning model. You might also see the term “predicted score.” However, predicted scores are not the final output of a model. An evaluation of the model follows the score.
Regression: A model for predicting a value based on independent variables, such as predicting the price of a car based on its year and make.
Score: A predicted value generated from a trained classification or regression model, using the Score Model module in Machine Learning Studio. Classification models also return a score for the probability of the predicted value. Once you’ve generated scores from a model, you can evaluate the model’s accuracy using the Evaluate Model module.
Sample: A part of a data set intended to be a representative of the whole. Samples can be selected randomly or based on specific features of the data set.