Categories → Causal Inference , Statistics , Method
Parametric models make assumptions about the underlying data distribution, while non-parametric models make no such assumptions and rely on the data to estimate the distribution.
In statistics and machine learning, a model is a mathematical representation of a system or process that is used to make predictions or infer information about the system. Models can be broadly classified into two categories: parametric and non-parametric models.
A parametric model is one in which the form of the probability distribution underlying the data is specified using a set of parameters. For example, a linear regression model assumes that the relationship between the dependent variable and the independent variables is linear, and the parameters of the model (i.e., the intercept and slope coefficients) are estimated from the data. Once the parameters of the model are estimated, the model can be used to make predictions on new data.
In contrast, a non-parametric model makes no assumptions about the underlying probability distribution of the data. Instead, it seeks to estimate the distribution directly from the data. For example, a k-nearest neighbor (KNN) classifier makes predictions based on the k-nearest data points in the training set, without explicitly estimating the underlying probability distribution or relying on it having any particular shape.
The choice between parametric and non-parametric models has important implications for the performance of the model. Parametric models are often more efficient and require fewer training examples than non-parametric models, but they may be less flexible and may not be able to capture complex relationships in the data. They also require you to make assumptions which you may not have evidence or confidence for. Non-parametric models, on the other hand, are often more flexible and can capture complex relationships in the data, but they may require more training examples and may be computationally expensive.
Another important consideration when choosing between parametric and non-parametric models is the assumption of underlying data distribution. Parametric models assume that the data is drawn from a specific distribution, and if this assumption is not met, the model may not perform well. Non-parametric models do not make any assumptions about the underlying distribution, making them more robust to violations of distributional assumptions.
In summary, the choice between a parametric and non-parametric model depends on the specific problem at hand, the amount of data available, and the assumptions that can be made about the underlying distribution of the data. There is no right answer, only choices which reflect the data, problem and prior knowledge you possess.