# 计算机专业毕业设计文献翻译

第六章 大尺寸问题 在这一章节我们讨论一些方法 ,为了解决 LS-SVM 的大数据设定中的方法设计和归类问题 . We explain Nystom method as proposed in the context of Gaussian processes and incomplete Cholesky factorization for low rank approximation. Then a new technique of fixed size LS-SVM is presented. In this fixed size LS-SVM method one solves the primal problem instead of the dual, after estimating the map to the feature space B based upon the eigenfunctions obtained form kernel PCA, which is explained in more detail in the next Chapter. This method gives explicit links between function estimation and density estimation, exploits the primal-dual formulations, and addresses the problem of how to actively select suitable support vectors instead of taking random points as in the Nystrom method. Next we explain methods that aim at constructing a suitable basis in the feature space. Furthermore, approaches for combining submodels are discussed such as committee networks and nonlinear and multilayer extensions of this approach. 6.1 Low rank approximation methods 6.1.1 Nystrom method Suppose one takes a linear kernel. We already mentioned that one can in fact equally well solve then the primal problem as the dual problem. In fact solving the primal problem is more advantageous for larger data sets wile solving the dual problem is more suitable for large dimensional input Fig.6.1 For linear support vector machines the dual problem is suitable for solving problems with large dimensional input spaces while the primal problem is convenient twords large data sets. However, for nonlinear SVMs one has no expression for B(x), as a result one can only solve the dual problem in terms of the related kernel function. In the method of Fixed Size LS-SVM the Nystrom method is used to estimate eigenfunctions. After obtaining estimates for B(x) and linking primal-dual formulations, the computation of W, B is done in the primal space. spaces because the unknowns are ERn and ERN, respectively, where n denotes the dimension for the input space and N the number of given training data points. for example in the linear function estimation case one has 公式 (6.1) by elimination of the error variables EK, which one can immediately solve. In this case the mapping B becomes B(XK) = (XK) and there is no need to solve the dual problem in the support values A, certainly not for large data sets. For the nonlinear case on the other hand the situation is much more complicated. For many choices of the kernel, B (~) may become infinite dimensional and hence also the W vector. However, one may still try in this case to find meaningful estimates for B (XK). A procedure to find such estimates is implicitly done by the Nystrom method, which is well known in the area of integral equations [14； 63] and has been successfully applied in the context of Gaussian processes by Williams & Seeger in [294]. The method is related to finding a low rank approximation to the given kernel matrix by randomly choosing M rows/columns of the kernel matrix. Let us denote the big kernel matrix by (N, N) and the small kernel matrix based on the random subsample (M, M) with M < N (In practice often M《 N). Consider the eigenvalue decomposition of the small kernel matrix (M, M) (6.2) where B contains the eigenvalues and U the corresponding eigenvectors. This is related to eigenfunctions 1 and eigenvalues 2 of the integral equation（ 6.3） 。 as follows (6.4) 。 Where M1 and M2 are estimates to M1 and M2 respectively, for the integral equation, and uki denotes the ki-th entry of the matrix U. This can be understood form sampling the integral by M points x1, x2… , xm. For the big kernel matrix on has the eigenvalue decomposition (6.5) Furthermore, as explained in [294] one has 。 (6.6) 。 One can then show that (6.7) where O (N, M) is the N block matrix taken from O (N, N). These insights are used then for solving in an approximate sense the linear system (6.8)without bias term in the model, as considered in Gaussian process regression problems. By applying the Sherman-Morrison-Woodbury formula [98] one obtains [294]: (6.9) where 2 are calculated from (6.6) base upon 2 from the small matrix. In LS-SVM classification and regression one usually considers a bias term which leads to centering of the kernel matrix. For application of the Nystrom method the eigenvalue decomposition of the centered kernel matrix is the taken. Finally, further characterizations of the error for approximations to a kernel matrix have been investigated in [206； 226]. The Nystrom method approach has been applied to the Bayesian LS-SVM framework at the second level of inference while solving the level 1 problems without Nystrom method approximation by the conjugate gradient method [273]. In Table 6.1 this is illustrated on three data sets cra( leptograpsus crab), rsy(Ripley synthetic data), hea (heart disease) according to [273]. In [273] it has also been illustrated that larger data sets such as the UCI adult data set, a successful approximation by the Nystrom method can be made at the second level of inference by taking a subsample of 100 data points in order to determine (R, O) instead of the whole training data set size 33000. 6.1.2 Incomplete Cholesky factorization The Sherman-Morrison-Woodbury formula which is used within the Mystrom method has also been widely used in the context of interior point algorithms for linear programming. However, as pointed out by Fine & Scheinberg in [79] it may lead to numerical difficulties. In [79] it has been illustrated with a simple example how the computed solutions may not even be close to the correct solution when applying the Woodbury formula (when the limited machine precision is taken into account). Small numerical perturbations due to limited machine precision may cause numerical instabilities resulting into computed solutions that are far form the true solution. Table 6.1 application of the Mystrom approximation in the Bayesian LS-SVM framework at the second level of inference. Shown are performances and inferred hyperparameters for different proportions 10% … 100% of the training data N. the standard deviation on 10 randomizations of the Nystrom sample is given between parentheses. Solutions may not even be close to the correct solution when applying the Woodbury formula (when the limited machine precision is taken into account ). Small numerical perturbations due to limited machine precision may cause numerical instabilities resulting into computed solutions that are far from the true solution. A numerically stable method for kernel based methods has been studied in [79], which is based on a low rank update of the Cholesky factorization [98] of the kernel matrix for solving kernel based learning problems of the form 2 with kernel matrix 1. It is often the case that the exact low-rank representation of the kernel matrix 3 is not given or even does not exist when the rank of the kernel matrix is large. The best that one may hope then is to find a good approximation together with keeping the rank low. Therefore, one can compute an incomplete Cholesky factorization 4 with 5 a lower triangular matrix where some columns of 5 are zero. This procedure can also be applied to almost singular matrices by skipping pivots that are below a certain threshold [98]. In [97] this method has been successfully supplied to SVMs. Fig.6.2 Fixed size LS-SVM: the number of support vectors is pre-fixed beforehand and the support vectors are actively selected from the pool of training data. After estimating eigenfunctions the model is computed in the primal space with calculation of 1. In the working set of support vectors a point is randomly selected and replaced by a randomly selected point from the training data if this new point improves an entropy criterion which is related to density estimation and kernel PCA. 6.2 Fixed Size LS-SVMs In the Nystrom method an approximate solution to the linear system is computed based upon a random subsample of the given training data set. However, an important open question at this point is whether the random points can be selected in another way than just trying out several random subsamples and comparing these solutions on a separate validation set. We explain now how active selection of such support vectors can be done in relation to the Nystrom method for a fixed size LS-SVM. Fig. 6.3 In the method of fixed size LS-SVM the Nystrom method, kernel PCA and density estimation are linked to estimation of eigenfunctions and active selection of support vectors. Regression in the primal space is done in a next step. 6.2.1 Estimation in primal weight space As we discussed in (6.11), for large data sets it would be advantageous if one could solve the problem in the primal space (Fig.6.1). However, in the nonlinear case we would then need a