References(40)
[1]
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J.; Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[2]
Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.
[3]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6517-6525, 2017.
[4]
Borji, A.; Cheng, M. M.; Hou, Q. B.; Jiang, H. Z.; Li, J. Salient object detection: A survey. Computational Visual Media Vol. 5, No. 2, 117-150, 2019.
[5]
Xu, D. F.; Zhu, Y. K.; Choy, C. B.; Fei-Fei, L. Scene graph generation by iterative message passing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3097-3106, 2017.
[6]
Peyre, J.; Laptev, I.; Schmid, C.; Sivic, J. Detecting unseen visual relations using analogies. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1981-1990, 2019.
[7]
Chao, Y. W.; Liu, Y. F.; Liu, X. Y.; Zeng, H. Y.; Deng, J. Learning to detect human-object interactions. arXiv preprint arXiv:1702.05448, 2017.
[8]
Gkioxari, G.; Girshick, R.; Dollár, P.; He, K. M. Detecting and recognizing human-object interactions. arXiv preprint arXiv:1704.07333, 2017.
[9]
Ma, C. Y.; Kadav, A.; Melvin, I.; Kira, Z.; AlRegib, G.; Graf, H. P. Attend and interact: Higher-order object interactions for video understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6790-6800, 2018.
[10]
Mallya, A.; Lazebnik, S. Learning models for actions and person-object interactions with transfer to question answering. In: Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 414-428, 2016.
[11]
Gao, C.; Zou, Y. L.; Huang, J. B. iCAN: Instance-centric attention network for human-object interaction detection. arXiv preprint arXiv:1808.10437, 2018.
[12]
Li, Y. L.; Zhou, S. Y.; Huang, X. J.; Xu, L.; Ma, Z.; Fang, H. S.; Wang, Y. F.; Lu, C. W. Transferable interactiveness knowledge for human-object interaction detection. arXiv preprint arXiv:1881.08264, 2019.
[13]
Wang, T. C.; Anwer, R. M.; Khan, M. H.; Khan, F. S.; Pang, Y. W.; Shao, L. et al. Deep contextual attention for human-object interaction detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 5693-5701, 2019.
[14]
Gupta, T.; Schwing, A. G.; Hoiem, D. No-frills human-object interaction detection: Factorization, layout encodings, and training techniques. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9676-9684, 2019.
[15]
Wan, B.; Zhou, D. S.; Liu, Y. F.; Li, R. J.; He, X. M. Pose-aware multi-level feature network for human object interaction detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9468-9477, 2019.
[16]
Zhou, P.; Chi, M. Relation parsing neural network for human-object interaction detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 843-851, 2019.
[17]
Gupta, S.; Malik, J. Visual semantic role labeling. arXiv preprint arXiv:1505.04474, 2015.
[18]
Zhao, Z. C.; Ma, H. M.; You, S. D. Single image action recognition using semantic body part actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3411-3419, 2017.
[19]
Luvizon, D. C.; Picard, D.; Tabia, H. 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5137-5146, 2018.
[20]
Abdulmunem, A.; Lai, Y. K.; Sun, X. F. Saliency guided local and global descriptors for effective action recognition. Computational Visual Media Vol. 2, No. 1, 97-106, 2016.
[21]
Girdhar, R.; Ramanan, D. Attentional pooling for action recognition. arXiv preprint arXiv:1711.01467, 2017.
[22]
Ulutan, O.; Iftekhar, A. S. M.; Manjunath, B. S. VSGNet: Spatial attention network for detecting human object interactions using graph convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 13617-13626, 2020.
[23]
Qi, S. Y.; Wang, W. G.; Jia, B. X.; Shen, J. B.; Zhu, S. C. Learning human-object interactions by graph parsing neural networks. In: Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 407-423, 2018.
[24]
Xu, B.; Wong, Y.; Li, J.; Zhao, Q.; Kankanhalli, M. S. Learning to detect human-object interactions with knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019-2028, 2019.
[25]
Kato, K.; Li, Y.; Gupta, A. Compositional learning for human object interaction. In: Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 247-264, 2018.
[26]
Bansal, A.; Rambhatla, S. S.; Shrivastava, A.; Chellappa, R. Detecting human-object interactions via functional generalization. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 10460-10469, 2020.
[27]
Wang, T. C.; Yang, T.; Danelljan, M.; Khan, F. S.; Zhang, X. Y.; Sun, J. Learning human-object interaction detection using interaction points. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4115-4124, 2020.
[28]
Liao, Y.; Liu, S.; Wang, F.; Chen, Y. J.; Qian, C.; Feng, J. S. PPDM: Parallel point detection and matching for real-time human-object interaction detection. arXiv preprint arXiv:1912.12898, 2020.
[29]
He, K. M.; Gkioxari, G.; Dollar, P.; Girshick, R. B. ”Mask R-CNN”. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 42, No. 2, 386-397, 2020.
[30]
Fang, H. S.; Xie, S. Q.; Tai, Y. W.; Lu, C. W. RMPE: Regional multi-person pose estimation. arXiv preprint arXiv:1612.00137, 2016.
[31]
Fang, H. S.; Cao, J. K.; Tai, Y. W.; Lu, C. W. Pairwise body-part attention for recognizing human-object interactions. In: Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Vol. 11214. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 52-68, 2018.
[32]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. 2013.Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Vol. 2, 3111-3119, 2013.
[33]
Lin, T. Y.; Goyal, P.; Girshick, R.; He, K. M.; Dollár, P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999-3007, 2017.
[34]
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision—ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740-755, 2014.
[36]
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[37]
Zhou, T. F.; Wang, W. G.; Qi, S. Y.; Ling, H. B.; Shen, J. B. Cascaded human-object interaction recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4262-4271, 2020.
[38]
Shen, L.; Yeung, S.; Hoffman, J.; Mori, G.; Fei-Fei, L. Scaling human-object interaction recognition through zero-shot learning. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1568-1576, 2018.
[39]
Li, Y. L.; Liu, X. P.; Lu, H.; Wang, S. Y.; Liu, J. Q.; Li, J. F.; Lu, C. W. Detailed 2D-3D joint representation for human-object interaction. arXiv preprint arXiv:2004.08154, 2020.
[40]
Li, Y. L.; Xu, L.; Liu, X. P.; Huang, X. J.; Xu, Y.; Wang, S. Y.; Fang, H. S.; Ma, Z.; Chen, M. Y.; Lu, C. W. PaStaNet: Toward human activity knowledge engine. arXiv preprint arXiv:2004.00945, 2020.