TeMing
Huang, Vojislav Kecman, Ivica Kopriva
Series Studies in Computational Intelligence, Vol 17
Springer Verlag, Berlin, Heidelberg, 2006
260 pp. 96 illus., Hardcover, ISBN
3540316817
This
is the first book that treats the fields of supervised,
semisupervised and unsupervised machine learning
in a unifying
way. In particular,
it is the first presentation of the standard
and improved graph based semisupervised (manifold) algorithms in a
textbook. The book presents both the theory and the algorithms for
mining huge data sets by using support vector machines (SVMs) in
an iterative way. How the kernel based SVMs can be used for the
dimensionality reduction (feature elimination) is shown in a
detail and with a great care. The book also shows the similarities
and differences between the two most popular unsupervised
techniques, namely between the principal component analysis (PCA)
and the independent component analysis (ICA). It is demonstrated
that PCA, which decorrelates data pairs, is optimal for Gaussian
sources and suboptimal for nonGaussian ones. It is also pointed
to the necessity of using ICA for non Gaussian sources as well as
to ICA’s inefficiency in the case of Gaussian ones. PCA
algorithm known as whitening, or sphering transform, is derived.
Batch and adaptive ICA algorithms are derived through the
minimization of the mutual information which is an exact measure
of statistical (in)dependence between data pairs.
The
theory presented is followed by software and/or algorithmic
solutions which make the presentation much easier to understand.
The book is rich in graphics and contains a lot of examples which,
in addition to understanding the concepts in a much pleasant way,
enables the reader to develop his/her own codes for solving the
problems. All the algorithms presented are used in solving several
benchmarking realworld applications in bioinformatics (gene
microarrays), textcategorization, numerals recognition, as well
as in the images and audio signals demixing (blind source
separation).
The
book focuses on a broad range of machine learning algorithms and
it is aimed at senior undergraduate students, graduate students
and practicing researchers and scientists who want to use and
develop the kernels based models rather than simply study them.
This book is accompanied with this site for
downloading the data, software ISDA and SemiL for huge data set
modeling in a supervised and semisupervised manner respectively.
In addition, it contains MATLAB based PCA and ICA routines for
unsupervised learning, as well as the MATLAB implementation of a
conjugate gradient algorithm for solving linear systems of
equations with boxconstraints. It also contains some other
material used in the book, as well as some additional links to
related websites. Thus, it may be very helpful for readers to make
occasional visits to this site and to download the newest version
of software and/or data files introduced in the book.
