MIC<i>k</i>NN: Multi-Instance Covering <i>k</i>NN Algorithm

Shu Zhao; Chen Rui; Yanping Zhang

doi:10.1109/TST.2013.6574674

Tsinghua Science and Technology 2013, 18(4): 360-368 https://doi.org/10.1109/TST.2013.6574674

Open Access | Issue | Published: 05 August 2013

MICkNN: Multi-Instance Covering kNN Algorithm

Show Author's Information Hide Author's Information Shu Zhao, Chen Rui, Yanping Zhang(

)

Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal Processing, Anhui University, Hefei 230601, China

Keywords:

classification, constructive covering algorithm, mining ambiguous data, multi-instance, kNN algorithm

Cite this article:

Zhao S, Rui C, Zhang Y. MICkNN: Multi-Instance Covering kNN Algorithm. Tsinghua Science and Technology, 2013, 18(4): 360-368. https://doi.org/10.1109/TST.2013.6574674

Download citation

EndNote(RIS)

BibTeX

351

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.

Full text

Abstract

Full text

Outline

About this article

MICkNN: Multi-Instance Covering kNN Algorithm

Show Author's information Hide Author's Information Shu Zhao, Chen Rui, Yanping Zhang(

)

Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal Processing, Anhui University, Hefei 230601, China

Abstract

Keywords: classification, constructive covering algorithm, mining ambiguous data, multi-instance, kNN algorithm

References(40)

[1]

T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol. 89, no. 1, pp. 31-71, 1997.

DOI Google Scholar

[2]

B. B. Ni, Z. Song, and S. C. Yan, Web image mining towards universal age estimator, in Proceedings of the 17th ACM International Conference on MultimediaInt, ACM, 2009, pp. 85-94.

DOI

[3]

A. Zafra, C. Romero, S. Ventura, and E. Herrera-Viedma, Multi-instance genetic programming for web index recommendation, Expert Systems with Applications, vol. 36, no. 9, pp. 11470-11479, 2009.

DOI Google Scholar

[4]

Z. H. Zhou, K. Jiang, and M. Li, Multi-instance learning based web mining, Applied Intelligence, vol. 22, no. 2, pp. 135-147, 2005.

DOI Google Scholar

[5]

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, Advances in Neural Information Processing Systems, vol. 15, pp. 561-568, 2002.

Google Scholar

[6]

Z. H. Zhou, Y. Y. Sun, and Y. F. Li, Multi-instance learning by treating instances as non-I.I.D samples, in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 1249-1256.

DOI

[7]

Z. Jorgensen, Y. Zhou, and M. Inge, A multiple instance learning strategy for combating good word attacks on spam filters, The Journal of Machine Learning Research, vol. 9, no. 6, pp. 1115-1146, 2008.

Google Scholar

[8]

J. B. Bi, Y. X. Chen, and J. Z. Wang, A sparse support vector machine approach to region-based image categorization, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1121-1128, 2005.

Google Scholar

[9]

Z. Y. Fu, A. Robles-Kelly, and J. Zhou, MILIS: Multiple instance learning with instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 958-977, 2011.

DOI Google Scholar

[10]

S. H. Feng and D. Xu, Transductive multi-instance multi-label learning algorithm with application to automatic image annotation, Expert Systems with Applications, vol. 37, no. 1, pp. 661-670, 2010.

DOI Google Scholar

[11]

D. T. Nguyen, C. D. Nguyen, R. Hargraves, L. A. Kurgan, and K. J. Cios, mi-DS: Multiple-instance learning algorithm, IEEE Transactions on Systems, Man, and Cybernetics Society. Part B, Cybernetics, vol. 43, no. 1, pp. 143-154, Feb. 2013.

DOI Google Scholar

[12]

B. Babenko, M. H. Yang, and S. Belongie, Visual tracking with online multiple instance learning, in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 983-990.

DOI

[13]

B. Babenko, M. H. Yang, and S. Belongie, Robust object tracking with online multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1619-1632, 2011.

DOI Google Scholar

[14]

C. Leistner, A. Saffari, and H. Bischof, MIForests: Multiple-instance learning with randomized trees, in Proceedings of the 11th European Conference on Computer Vision: Part VI, Springer, 2010, pp. 29-42.

DOI

[15]

Y. Xie, Y. Y. Qu, C. H. Li, and W. S Zhang, Online multiple instance gradient feature selection for robust visual tracking, Pattern Recognition Letters, vol. 33, no. 9, pp. 1075-1082, 2012.

DOI Google Scholar

[16]

B. Zeisl, C. Leistner, A. Saffari, and H. Bischof, On-line semi-supervised multiple-instance boosting, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1879-1886.

DOI

[17]

Z. Q. Qi, Y. T. Xu, L. S. Wang, and Y. Song, Online multiple instance boosting for object detection, Neurocomputing, vol. 74, no. 10, pp. 1769-1775, 2011.

DOI Google Scholar

[18]

M. Bellare, T. Ristenpart, and S. Tessaro, Multi-instance security and its application to password-based cryptography, Advances in Cryptology-CRYPTO 2012, Springer, pp. 312-329, 2012.

DOI

[19]

O. Maron and T. Lozano-Pérez, A framework for multiple-instance learning, Advances in Neural Information Processing Systems, vol. 10, pp. 570-576, 1998.

Google Scholar

[20]

Q. Zhang and S. A. Goldman, EM-DD: An improved multiple-instance learning technique, Advances in Neural Information Processing Systems, vol. 14, no. 2022, pp. 1073-1080, 2001.

Google Scholar

[21]

Y. J. Han, Q. Tao, and J. Wang, Avoiding false positive in multi-instance learning, Advances in Neural Information Processing Systems, vol. 23, pp. 1-8, 2010.

Google Scholar

[22]

M. Y. Kim and F. Torre, Gaussian processes multiple instance learning, in Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 535-542.

[23]

F. X. Li and C. Sminchisescu, Convex multiple-instance learning by estimating likelihood ratio, Advances in Neural Information Processing Systems, vol. 23, pp. 1360-1368, 2010.

Google Scholar

[24]

M. L. Zhang and Z. H. Zhou, Adapting RBF neural networks to multi-instance learning, Neural Processing Letters, vol. 23, no. 1, pp. 1-26, 2006.

DOI Google Scholar

[25]

J. T. Kwok and P. M. Cheung, Marginalized multi-instance kernels, in Proceedings of the 20th International Joint Conference on Artifical Intelligence, Morgan Kaufmann Publishers Inc, 2007, pp. 901-906.

[26]

D. Zhang, F. Wang, L. Si, and T. Li, M3IC: Maximum margin multiple instance clustering, in Proceedings of the 21th International Joint Conference on Artifical intelligence, Morgan Kaufmann Publishers Inc, 2009, pp. 1339-1344.

[27]

C. H. Li, I. Gondra, and L. Liu, An efficient parallel neural network-based multi-instance learning algorithm, The Journal of Supercomputing, vol. 62, no. 2, pp. 724-740, 2012.

DOI Google Scholar

[28]

H. Y. Wang, Q. Yang, and H. B. Zha, Adaptive p-posterior mixture-model kernels for multiple instance learning, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 1136-1143.

DOI

[29]

J. Wang and J. D. Zucker, Solving the multiple-instance problem: A lazy learning approach, in Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2000, pp. 1119-1125.

[30]

T. Deselaers and V. Ferrari, A conditional random field for multiple-instance learning, in Proceedings of the 27th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2010, pp. 1119-1125.

[31]

B. Babenko, N. Verma, P. Dollár, and S. J. Belongie, Multiple instance learning with manifold bags, in Proceedings of the 28th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2011, pp. 81-88.

[32]

D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence, Multiple instance learning on structured data, Advances in Neural Information Processing Systems, vol. 24, pp. 145-153, 2011.

Google Scholar

[33]

L. X. Jiang, Z. H. Cai, D. H. Wang, and H. Zhang, Bayesian citation-KNN with distance weighting, International Journal of Machine Learning and Cybernetics, pp. 1-7, 2013.

Google Scholar

[34]

O. Maron, Learning from ambiguity, Ph.D. dissertation, Massachusetts Institute of Technology, USA, 1998.

[35]

L. Zhang and B. Zhang, A geometrical representation of mcculloch-pitts neural model and its applications, IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 925-929, 1999.

DOI Google Scholar

[36]

W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biology, vol. 5, no. 4, pp. 115-133, 1943.

DOI Google Scholar

[37]

P. N. Tan, Introduction to Data Mining, Pearson Education India, 2007.

[38]

G. A. Edgar, Measure, Topology, and Fractal Geometry, Springer-Verlag, New York, 2008.

DOI

[39]

Y. Lia, D. M. J. Taxa, R. P. W. Duina, and M. Looga, Multiple-instance learning as a classifier combining problem, Pattern Recognition, vol. 46, no. 3, pp. 865-874, 2013.

DOI Google Scholar

[40]

Z. H. Zhou, M. L. Zhang, S. J. Huang, and Y. F. Li, Multi-instance multi-label learning, Artificial Intelligence, vol. 176, no. 1, pp. 2291-2320, 2012.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 15 June 2013

Accepted: 25 June 2013

Published: 05 August 2013

Issue date: August 2013

Copyright

Acknowledgements

The authors would like to thank all the reviewers for their insightful and constructive comments. This work was partly supported by the National Natural Science Foundation of China (Nos. 61073117 and 61175046), the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016), the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183), and the 211 Project of Anhui University.