Download PDF by Joydeep Ghosh (Editor), Diane Lambert (Editor), David: Proceedings of the 6th SIAM International Conference on Data

Show description

Each digit is represented as a vector in 16-dimensional space. The dataset is divided into a training set (7494 digits) and a 1 ftp: //ftp . ics . uci . edu/pub/machine-learning-databas 30 test set (3498 digits). For the purpose of visualization as in [16], we select the O's, 6's and 9's. We apply LDA, SAVE, HDA, and CPM to this 3-class subproblem (which contains 2219 training digits and 1035 test digits) and extract 2 leading discriminant directions. We then project the test set onto these 2 dimensions.

3 SAVE versus CPM SAVE can also be considered as an approximation of HDA, based on the following result. 1. Let be defined above and let with d < n. Then 4 Experiments In this section, we evaluate CPM on both synthetic and real-world datasets. The 1-Nearest-Neighbor algorithm is applied for classification. 2 is used for CPM in all experiments. However, the best value of a may be estimated through cross-validation. The first experiment is on a synthetic dataset, where the class centroids of three classes do not coincide, but they have different covariance matrices.

SDM Conference, 2003. [10] B. Efron, and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, 1993. [11] A. Hinneburg, C. C. Aggarwal,and D. Keim. What is the nearest Neighbor in High Dimensional Spaces? VLDB Conference, 2000. [12] A. K. Jain, M. N. Murty and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, 1999. [13] N. Katayama, and S. Satoh. Distinctiveness-Sensitive Nearest-Neighbor Search for Efficient Similarity Retrieval of Multimedia Information. ICDE Conference, 2001.

