MML Glossary
Bayes (1702-1761), as in Bayes's theorem P(H&D) = P(H).P(D|H) = P(D).P(H|D).
Bayesian: Styles of inference (machine learning, statistics, etc.) that rely on Bayes's theorem and the use of priors.
Classification: See supervised and unsupervised classification.
Conditional Probability: P(B|A), the probability of B given A.
Conjugate prior: A family of prior distributions is conjugate for f(x|θ) if the posterior distribution is in the family whenever the prior is in the family;
Consistent: An estimator is consistent if it converges to the correct estimate (assuming that the model class includes the true model) as more and more data are made available.
Data Mining: Machine learning + some aspects of data bases, with the emphasis on (very) large data sets and efficient and robust (and sometimes ad hoc) methods. (If you know of a better, short definition, tell me.)
Data Space = Sample Space: Set of values from which data are drawn, e.g.,
Estimate: Theta-hat, a value of a parameter, theta, inferred from (i.e., fitted to) data.
Estimator: A function (mapping) from the data-space to the space of parameter values.
Expected Future Data: Weighted (by posterior probability) average over all hypotheses (models, parameter estimates).
Fisher, R. A. (1890-1962).
Independent: A and B are independent if P(A&B)=P(A).P(B).
Invariant: An estimator is invariant if f'(e(D)) = e'(f(D)), where f is a monotonic transformation on the data space, and f' is the corresponding transformation on the parameter space.
Joint Probability: E.g. P(A&B), the joint probability of A and B. See conditional and independent.
Kullback Leibler distance: Between two probability distributions, KL({pi},{qi}) =
Likelihood: P(D|H), where D is the data set (training data), and H is a hypothesis (parameter estimate, model, theory).
MAP: Maximum aposteriori estimation; in the simplest cases only, MML is equivalent to MAP, this is not true in general
Markov Model of order k: Given a series x1, x2, x3,... where P(xt=e) can depend on xt-k to xt-1, only.
MDL: Minimum Description Length, since J.Rissannen, Parameter Estimation by Shortest Description of Data, Proc JACE Conf. RSME, pp.593-, 1976. Also see MML below.
Message Length: The length, usually in bits, of a message in an optimal code encoding some event (or data D). Often as two-part message, -log2(P(H))+-log2(P(D|H). Message after Shannon's mathematical theory of communication (1948).
Minimum EKL Estimator, MinEKL: The parameter estimate for a distribution (or model or hypothesis) that minimises the KL distance between the distribution and Expected Future Data, i.e., maximises the likelihood of Expected Future Data.
Mixture Model: The weighted average of two of more models, especially mixture of probability distributions in unsupervised classification.
MML: Minimum Message Length, since C.S.Wallace & D.M.Boulton, An Information Measure for Classification, Computer Jrnl., 11(2) pp.185-194, 1968.
Multivariate: Data, distribution etc. having multiple attributes (variables).
Observation: A data item, e.g., from an experiment.
Ockham: As in Ockham's razor. Also Occam.
Odds ratio: Simply the ratio of two probabilities, P(A)/P(B). Also as in posterior odds-ratio P(H1|D)/P(H2|D)=P(H1).P(D|H1)/(P(H2).P(D|H2)).
Prior: Before, particularly "before actual data are seen", as in prior probability distribution of parameters and/or models, P(H).
Posterior: After, particularly "after actual data are seen", as in posterior probability distribution of parameters and/or models, P(H|D)=P(H&D)/P(D)=P(H).P(D|H)/P(D).
Regression: To model, fit or infer, but particularly to fit a function (line, polynomial, etc.) through points {(xi,yi)} where y is dependent on x.
Sample Space: Space, set of values over which a random variable ranges.
Strict MML (SMML): See Farr and Wallace (2002).
Supervised Classification: To infer a function, c:S→T, a classification function, given examples (training data) drawn from S×T.
Univariate: Data, distribution etc. having one attribute.
Unsupervised Classification: To infer a mixture model from examples (data).
Variable (1): Random variable.
Variable (2): An attribute of an observation (thing), e.g., a column of a data-set.
von Mises (- Fisher, vMF), probability distributions on directions in RD.
Wallace, C. S. (1933-2004).
Some sources
- G. Farr, Information Theory and MML Inference, School of Comp. Sci. and Software Eng., Monash University 1997-1999
- G. Farr & C. S. Wallace. The Complexity of Strict Minimum Message Length Inference, The Computer Journal, 45(3), pp.285-292, 2002
- C. S. Wallace & D. M. Boulton, An Information Measure for Classification, The Computer Journal, 11(2), pp.185-194, August 1968
- C. S. Wallace & P. R. Freeman, Estimation and Inference by Compact Coding, J. Royal Stat. Soc., 49(3), pp.240-265, 1987
- C. S. Wallace's book, 2005