Journal Home > Volume 18 , Issue 4

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.


menu
Abstract
Full text
Outline
About this article

MICkNN: Multi-Instance Covering kNN Algorithm

Show Author's information Shu ZhaoChen RuiYanping Zhang( )
Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal Processing, Anhui University, Hefei 230601, China

Abstract

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.

Keywords: classification, constructive covering algorithm, mining ambiguous data, multi-instance, kNN algorithm

References(40)

[1]
T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol. 89, no. 1, pp. 31-71, 1997.
[2]
B. B. Ni, Z. Song, and S. C. Yan, Web image mining towards universal age estimator, in Proceedings of the 17th ACM International Conference on MultimediaInt, ACM, 2009, pp. 85-94.
DOI
[3]
A. Zafra, C. Romero, S. Ventura, and E. Herrera-Viedma, Multi-instance genetic programming for web index recommendation, Expert Systems with Applications, vol. 36, no. 9, pp. 11470-11479, 2009.
[4]
Z. H. Zhou, K. Jiang, and M. Li, Multi-instance learning based web mining, Applied Intelligence, vol. 22, no. 2, pp. 135-147, 2005.
[5]
S. Andrews, I. Tsochantaridis, and T. Hofmann, Support vector machines for multiple-instance learning, Advances in Neural Information Processing Systems, vol. 15, pp. 561-568, 2002.
[6]
Z. H. Zhou, Y. Y. Sun, and Y. F. Li, Multi-instance learning by treating instances as non-I.I.D samples, in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, 2009, pp. 1249-1256.
DOI
[7]
Z. Jorgensen, Y. Zhou, and M. Inge, A multiple instance learning strategy for combating good word attacks on spam filters, The Journal of Machine Learning Research, vol. 9, no. 6, pp. 1115-1146, 2008.
[8]
J. B. Bi, Y. X. Chen, and J. Z. Wang, A sparse support vector machine approach to region-based image categorization, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1121-1128, 2005.
[9]
Z. Y. Fu, A. Robles-Kelly, and J. Zhou, MILIS: Multiple instance learning with instance selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 958-977, 2011.
[10]
S. H. Feng and D. Xu, Transductive multi-instance multi-label learning algorithm with application to automatic image annotation, Expert Systems with Applications, vol. 37, no. 1, pp. 661-670, 2010.
[11]
D. T. Nguyen, C. D. Nguyen, R. Hargraves, L. A. Kurgan, and K. J. Cios, mi-DS: Multiple-instance learning algorithm, IEEE Transactions on Systems, Man, and Cybernetics Society. Part B, Cybernetics, vol. 43, no. 1, pp. 143-154, Feb. 2013.
[12]
B. Babenko, M. H. Yang, and S. Belongie, Visual tracking with online multiple instance learning, in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 983-990.
DOI
[13]
B. Babenko, M. H. Yang, and S. Belongie, Robust object tracking with online multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1619-1632, 2011.
[14]
C. Leistner, A. Saffari, and H. Bischof, MIForests: Multiple-instance learning with randomized trees, in Proceedings of the 11th European Conference on Computer Vision: Part VI, Springer, 2010, pp. 29-42.
DOI
[15]
Y. Xie, Y. Y. Qu, C. H. Li, and W. S Zhang, Online multiple instance gradient feature selection for robust visual tracking, Pattern Recognition Letters, vol. 33, no. 9, pp. 1075-1082, 2012.
[16]
B. Zeisl, C. Leistner, A. Saffari, and H. Bischof, On-line semi-supervised multiple-instance boosting, in IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1879-1886.
DOI
[17]
Z. Q. Qi, Y. T. Xu, L. S. Wang, and Y. Song, Online multiple instance boosting for object detection, Neurocomputing, vol. 74, no. 10, pp. 1769-1775, 2011.
[18]
M. Bellare, T. Ristenpart, and S. Tessaro, Multi-instance security and its application to password-based cryptography, Advances in Cryptology-CRYPTO 2012, Springer, pp. 312-329, 2012.
DOI
[19]
O. Maron and T. Lozano-Pérez, A framework for multiple-instance learning, Advances in Neural Information Processing Systems, vol. 10, pp. 570-576, 1998.
[20]
Q. Zhang and S. A. Goldman, EM-DD: An improved multiple-instance learning technique, Advances in Neural Information Processing Systems, vol. 14, no. 2022, pp. 1073-1080, 2001.
[21]
Y. J. Han, Q. Tao, and J. Wang, Avoiding false positive in multi-instance learning, Advances in Neural Information Processing Systems, vol. 23, pp. 1-8, 2010.
[22]
M. Y. Kim and F. Torre, Gaussian processes multiple instance learning, in Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 535-542.
[23]
F. X. Li and C. Sminchisescu, Convex multiple-instance learning by estimating likelihood ratio, Advances in Neural Information Processing Systems, vol. 23, pp. 1360-1368, 2010.
[24]
M. L. Zhang and Z. H. Zhou, Adapting RBF neural networks to multi-instance learning, Neural Processing Letters, vol. 23, no. 1, pp. 1-26, 2006.
[25]
J. T. Kwok and P. M. Cheung, Marginalized multi-instance kernels, in Proceedings of the 20th International Joint Conference on Artifical Intelligence, Morgan Kaufmann Publishers Inc, 2007, pp. 901-906.
[26]
D. Zhang, F. Wang, L. Si, and T. Li, M3IC: Maximum margin multiple instance clustering, in Proceedings of the 21th International Joint Conference on Artifical intelligence, Morgan Kaufmann Publishers Inc, 2009, pp. 1339-1344.
[27]
C. H. Li, I. Gondra, and L. Liu, An efficient parallel neural network-based multi-instance learning algorithm, The Journal of Supercomputing, vol. 62, no. 2, pp. 724-740, 2012.
[28]
H. Y. Wang, Q. Yang, and H. B. Zha, Adaptive p-posterior mixture-model kernels for multiple instance learning, in Proceedings of the 25th International Conference on Machine Learning, ACM, 2008, pp. 1136-1143.
DOI
[29]
J. Wang and J. D. Zucker, Solving the multiple-instance problem: A lazy learning approach, in Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2000, pp. 1119-1125.
[30]
T. Deselaers and V. Ferrari, A conditional random field for multiple-instance learning, in Proceedings of the 27th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2010, pp. 1119-1125.
[31]
B. Babenko, N. Verma, P. Dollár, and S. J. Belongie, Multiple instance learning with manifold bags, in Proceedings of the 28th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, 2011, pp. 81-88.
[32]
D. Zhang, Y. Liu, L. Si, J. Zhang, and R. D. Lawrence, Multiple instance learning on structured data, Advances in Neural Information Processing Systems, vol. 24, pp. 145-153, 2011.
[33]
L. X. Jiang, Z. H. Cai, D. H. Wang, and H. Zhang, Bayesian citation-KNN with distance weighting, International Journal of Machine Learning and Cybernetics, pp. 1-7, 2013.
[34]
O. Maron, Learning from ambiguity, Ph.D. dissertation, Massachusetts Institute of Technology, USA, 1998.
[35]
L. Zhang and B. Zhang, A geometrical representation of mcculloch-pitts neural model and its applications, IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 925-929, 1999.
[36]
W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biology, vol. 5, no. 4, pp. 115-133, 1943.
[37]
P. N. Tan, Introduction to Data Mining, Pearson Education India, 2007.
[38]
G. A. Edgar, Measure, Topology, and Fractal Geometry, Springer-Verlag, New York, 2008.
DOI
[39]
Y. Lia, D. M. J. Taxa, R. P. W. Duina, and M. Looga, Multiple-instance learning as a classifier combining problem, Pattern Recognition, vol. 46, no. 3, pp. 865-874, 2013.
[40]
Z. H. Zhou, M. L. Zhang, S. J. Huang, and Y. F. Li, Multi-instance multi-label learning, Artificial Intelligence, vol. 176, no. 1, pp. 2291-2320, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 15 June 2013
Accepted: 25 June 2013
Published: 05 August 2013
Issue date: August 2013

Copyright

© The author(s) 2013

Acknowledgements

The authors would like to thank all the reviewers for their insightful and constructive comments. This work was partly supported by the National Natural Science Foundation of China (Nos. 61073117 and 61175046), the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016), the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183), and the 211 Project of Anhui University.

Rights and permissions

Return