Supervised, Semi-supervised, and Unsupervised Learning

Kernel Based Algorithms for Mining Huge Data Sets

Preface

This is a book about (machine) learning from (experimental) data. Many books devoted to this broad field have been published recently. One even feels tempted to begin the previous sentence with an adjective extremely. Thus, there is an urgent need to introduce both the motives for and the content of the present volume in order to highlight its distinguishing features.

Before doing that, few words about the very broad meaning of data are in order. Today, we are surrounded by an ocean of all kind of experimental data (i.e., examples, samples, measurements, records, patterns, pictures, tunes, observations,..., etc) produced by various sensors, cameras, microphones, pieces of software and/or other human made devices. The amount of data produced is enormous and ever increasing. The first obvious consequence of such a fact is - humans can’t handle such massive quantity of data which are usually appearing in the numeric shape as the huge (rectangular or square) matrices. Typically, the number of their rows (n) tells about the number of data pairs collected, and the number of columns (m) represent the dimensionality of data. Thus, faced with the Giga- and Terabyte sized data files one has to develop new approaches, algorithms and procedures. Few techniques for coping with huge data size problems are presented here. This, possibly, explains the appearance of a wording ’huge data sets’ in the title of the book.

Another direct consequence is that (instead of attempting to dive into the sea of hundreds of thousands or millions of high-dimensional data pairs) we are developing other "machines" or "devices" for analyzing, recognizing and/or learning from, such huge data sets. The so-called "learning machine" is predominantly a piece of software that implements both the learning algorithm and the function (network, model) which parameters has to be determined by the learning part of the software. Today, it turns out that some models used for solving machine learning tasks are either originally based on using kernels (e.g., support vector machines), or their newest extensions are obtained by an introduction of the kernel functions within the existing standard techniques. Many classic data mining algorithms are extended to the applications in the high-dimensional feature space. The list is long as well as the fast growing one and just the most recent extensions are mentioned here. They are - kernel principal component analysis, kernel independent component analysis, kernel least squares, kernel discriminant analysis, kernel k-means clustering, kernel selforganizing feature map, kernel Mahalanobis distance, kernel subspace classification methods and kernel functions based dimensionality reduction. What the kernels are, as well as why and how they became so popular in the learning from data sets tasks, will be shown shortly. As for now, their wide use as well as their efficiency in a numeric part of the algorithms (achieved by avoiding the calculation of the scalar products between extremely high dimensional feature vectors), explains their appearance in the title of the book.

Next, it is worth of clarifying the fact that many authors tend to label similar (or even same) models, approaches and algorithms by different names. One is just destine to cope with concepts of data mining, knowledge discovery, neural networks, Bayesian networks, machine learning, pattern recognition, classification, regression, statistical learning, decision trees, decision making etc. All of them usually have a lot in common, and they often use the same set of techniques for adjusting, tuning, training or learning the parameters defining the models. The common object for all of them is a training data set. All the various approaches mentioned start with a set of data pairs (x_i, y_i) where x_i represent the input variables (causes, observations, records) and y_i denote the measured outputs (responses, labels, meanings). However, even with the very commencing point in machine learning (namely, with the training data set collected), the real life has been tossing the coin in providing us either with

a set of genuine training data pairs (x_i, y_i) where for each input x_i there is a corresponding output y_i or with,
the partially labeled data containing both the pairs (x_i, y_i) and the sole inputs x_i without associated known outputs y_i or, in the worst case scenario, with
the set of sole inputs (observations or records) x_i without any information about the possible desired output values (labels, meaning) y_i.

It is a genuine challenge indeed to try to solve such differently posed machine learning problems by the unique approach and methodology. In fact, this is exactly what did not happen in the real life because the development in the field followed a natural path by inventing different tools for unlike tasks. The answer to the challenge was a, more or less, independent (although with some overlapping and mutual impact) development of three large and distinct sub-areas in machine learning - supervised, semi-supervised and unsupervised learning. This is where both the subtitle and the structure of the book are originated from. Here, all three approaches are introduced and presented in details which should enable the reader not only to acquire various techniques but also to equip him/herself with all the basic knowledge and requisites for further development in all three fields on his/her own.

The presentation in the book follows the order mentioned above. It starts with seemingly most powerful supervised learning approach in solving classification (pattern recognition) problems and regression (function approximation) tasks at the moment, namely with support vector machines (SVMs). Then, it continues with two most popular and promising semi-supervised approaches (with graph based semi-supervised learning algorithms; with the Gaussian random fields model (GRFM) and with the consistency method (CM)). Both the original setting of methods and their improved versions will be introduced. This makes the volume to be the first book on semi-supervised learning at all. The book’s final part focuses on the two most appealing and widely used unsupervised methods labeled as principal component analysis (PCA) and independent component analysis (ICA). Two algorithms are the working horses in unsupervised learning today and their presentation, as well as a pointing to their major characteristics, capacities and differences, is given the highest care here.

The models and algorithms for all three parts of machine learning mentioned are given in the way that equips the reader for their straight implementation. This is achieved not only by their sole presentation but also through the applications of the models and algorithms to some low dimensional (and thus, easy to understand, visualize and follow) examples. The equations and models provided will be able to handle much bigger problems (the ones having much more data of much higher dimensionality) in the same way as they did the ones we can follow and ’see’ in the examples provided. In the authors’experience and opinion, the approach adopted here is the most accessible, pleasant and useful way to master the material containing many new (and potentially difficult) concepts.