Luis Gonzalo Sánchez Giraldo::Research

Overview

RKHSs provide an elegant representation framework that generalizes well-known linear algorithms. The mainstream of research devoted to the construction of positive definite kernels and algorithm variants the employ the kernel based representation. More recent efforts have moved towards exploring the use of RKHSs in computing high order statistics of the data. Potential uses of high order statistics are not limited to problems such as hypothesis testing, but also provide means to the creation of new learning algorithms that employ these quantities as objective functions. In the context of adaptive systems, information theoretic learning investigates the uses of well-know quantities in information theory as surrogate measures of performance for task-specific measures such as mean squared error, probability of misclassification error, and detection error. For instance, a measure of mutual information between the input and the output of an adaptive system can be used in learning the parameters of such system. Nevertheless, to be able to adapt from experience, it is necessary to estimate the information theoretic quantities directly from data. My work has focused on investigating the relations between RKHSs and information theoretic quantities to provide estimators of information theoretic quantities that exploit the representational capabilities of RKHSs. The proposed estimators can be regarded as statistics based on reproducing kernels that can be applied to a variety of learning algorithms based on information theoretic objective functions.

Information Theoretic Learning

This work focuses on studying the properties of certain functionals applied to positive definite matrices that expose similar properties of an entropy functional Ref. By using positive definite kernels, we can estimate quantities with similar properties of entropy functionals directly from data. The proposed estimators of entropy using positive definite kernels bring the representational advantages of kernel methods along with some nice convergence properties; the rate of convergence can be independent of the input space dimensionality. For testing independence, our estimators improve upon other estimators such entropic graphs and the recently proposed kernel based statistics. The animation below shows the behaviour of one of the proposed quanties that behaves similar to mutual information. The right plot is the analogue MI between the variables corresponding to the horizontal and vertical axes in the left plot.

For metric learning Ref, we propose to minimize a kernel based analogue of conditional entropy between a label variable and a given transformation of the input. The resulting metric is able to preserve the class structure even after drastic reductions in dimensionality and achieving state of the art results. The following example, illustrates the ability of the proposed algortihm to unravel a two dimensional projection of the data that preserves the label information without requiring the samples to be lineraly separable. The left plot is the resulting projection after each training iteration. The right plot is tha resulting Gram matrix that shows how tha algorithm unveils the block struture corresponding to the class information.

We have applied our metric learning algorithm to visualization of high-dimensional neural data Ref. Namely, we visualize the motor cortex neuronal firing rates of a macaque during a center-out reaching task, and local field potentials in the somatosensory cortex of a rat during tactile stimulation of the forepaws. The obtained 2 dimensional projections preserve the natural topology of targets in the reaching task, and the peripheral touch sites of the forepaws.

Renyi's entropy of order 2 can be estimated using Parzen windows. Interestingly, this estimator can be efficiently approximated using a rank deficient decomposition of a Gram matrix. Our work in Ref, utilizes this approximation for unsupervised learning. The algorithm poses the problem of unsupervised learning as a trade off between information preservation and entropy minimization. Furthermore, in Ref we propose a reproducing kernel Hilbert space (RKHS) formulation of this problem. In this case, we obtain an pseudo-convex optimization problem that can be solved with techniques such sequential minimal optimization (initially proposed for support vector machines). The algorithm can be considered a support vector type algorithm in the sense that the solution is expressed with a subset of the initial set of data points Report. This approach compares favourably against other kernel-based feature extractors such as kernel PCA and kernel entropy component analysis, with the advantage of expressing the solution as a reduced data set.

For non-rigid image registration, we proposed information theoretic matching functions that are robust to noise and outliers Ref_A Ref_B. Sets of points are represented by densities and aligned via a non-rigid transformation such that a measure of divergence is minimized. Namely, we use the Cauchy-Schwartz divergence which computes the angle between densities in L₂. The proposed matching algorithm performs well for various degrees of transformation and levels of noise.

Online Kernel-Based Learning

Along with support vector machines, kernel principal component analysis are among the most representative algorithms that exploit this implicit representation. The kernel principal components are obtained by solving eigenvalue problem involving a matrix of pairwise evaluations of the input data samples called the Gram matrix. In signal processing applications, data often arrives in an online fashion. We proposed an online kernel PCA algorithm using a fixed point update rule based on a Rayleigh quotient in the RKHS Ref. The fixed point update convergences faster than the generalized Hebbian learning algorithm an it does not require setting a step size parameter.

I have also taken part on the development of a kernel based temporal difference learning algorithm Ref. This method has been applied in reinforcement learning to nonlinear approximations of state-action value functions from which optimal policies can be derived. A practical advantage of our approach is that the relation between step size and eligibility traces is well understood when normalized kernel functions are utilized allowing for stable learning rates to be easily set. Moreover, there is no requirement for parameter initialization and we have empirically observed that it outperforms other conventional non-linear methods that are based on temporal differences. This algorithm has been applied to neural decoding in reinforcement learning-based brain machine interfaces.

[ Home | Research | Papers | Bio/CV ]

Last update: Feb 5, 2019