Journal Home > Volume 9 , Issue 1

Photo composition is one of the most important factors in the aesthetics of photographs. As a popular application, composition recommendation for a photo focusing on a specific subject has been ignored by recent deep-learning-based composition recommendation approaches. In this paper, we propose a subject-aware image composition recommendation method, SAC-Net, which takes an RGB image and a binary subject window mask as input, and returns good compositions as crops containing the subject. Our model first determines candidate scores for all possible coarse cropping windows. The crops with high candidate scores are selected and further refined by regressing their corner points to generate the output recommended cropping windows. The final scores of the refined crops are predicted by a final score regression module. Unlike existing methods that need to preset several cropping windows, our network is able to automatically regress cropping windows with arbitrary aspect ratios and sizes. We propose novel stability losses for maximizing smoothness when changing cropping windows along with view changes. Experimental results show that our method outperforms state-of-the-art methods not only on the subject-aware image composition recommendation task, but also for general purpose composition recommendation. We also have designed a multi-stage labeling scheme so that a large amount ofranked pairs can be produced economically. Weuse this scheme to propose the first subject-aware composition dataset SACD, which contains 2777 images, and more than 5 million composition ranked pairs. The SACD dataset is publicly available at https://cg.cs.tsinghua.edu.cn/SACD/.


menu
Abstract
Full text
Outline
Electronic supplementary material
About this article

Focusing on your subject: Deep subject-aware image composition recommendation networks

Show Author's information Guo-Ye Yang1Wen-Yang Zhou1Yun Cai1Song-Hai Zhang1( )Fang-Lue Zhang2
BNRist, Department of Computer Science and Tech-nology, Tsinghua University, Beiing 100084, China
School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6012, New Zealand

Abstract

Photo composition is one of the most important factors in the aesthetics of photographs. As a popular application, composition recommendation for a photo focusing on a specific subject has been ignored by recent deep-learning-based composition recommendation approaches. In this paper, we propose a subject-aware image composition recommendation method, SAC-Net, which takes an RGB image and a binary subject window mask as input, and returns good compositions as crops containing the subject. Our model first determines candidate scores for all possible coarse cropping windows. The crops with high candidate scores are selected and further refined by regressing their corner points to generate the output recommended cropping windows. The final scores of the refined crops are predicted by a final score regression module. Unlike existing methods that need to preset several cropping windows, our network is able to automatically regress cropping windows with arbitrary aspect ratios and sizes. We propose novel stability losses for maximizing smoothness when changing cropping windows along with view changes. Experimental results show that our method outperforms state-of-the-art methods not only on the subject-aware image composition recommendation task, but also for general purpose composition recommendation. We also have designed a multi-stage labeling scheme so that a large amount ofranked pairs can be produced economically. Weuse this scheme to propose the first subject-aware composition dataset SACD, which contains 2777 images, and more than 5 million composition ranked pairs. The SACD dataset is publicly available at https://cg.cs.tsinghua.edu.cn/SACD/.

Keywords: deep learning, recommendation, subject-aware image composition, image cropping

References(57)

[1]
Chen, Y. L.; Klopp, J.; Sun, M.; Chien, S. Y.; Ma, K. L. Learning to compose with professional photographs on the web. In: Proceedings of the 25th ACM International Conference on Multimedia, 37–45, 2017.
DOI
[2]
Wei, Z. J.; Zhang, J. M.; Shen, X. H.; Lin, Z.; Mech, R.; Hoai, M.; Samaras, D. Good view hunting:Learning photo composition from dense view pairs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5437–5446, 2018.
DOI
[3]
Zeng, H.; Li, L. D.; Cao, Z. S.; Zhang, L. Reliable and efficient image cropping: A grid anchor based approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5942–5950, 2019.
DOI
[4]
Lu, W. R.; Xing, X. F.; Cai, B. L.; Xu, X. M. Listwise view ranking for image cropping. IEEE Access Vol. 7, 91904–91911, 2019.
[5]
Freeman, M. The Photographer’s Eye: Composition and Design for Better Digital Photos. Focal Press, 2007.
DOI
[6]
Tu, Y.; Niu, L.; Zhao, W. J.; Cheng, D. W.; Zhang, L. Q. Image cropping with composition and saliency aware aesthetic score map. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 12104–12111, 2020.
[7]
Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 91–99, 2015.
[8]
Zhang, L. M.; Song, M. L.; Zhao, Q.; Liu, X.; Bu, J. J.; Chen, C. Probabilistic graphlet transfer for photo cropping. IEEE Transactions on Image Processing Vol. 22, No. 2, 802–815, 2013.
[9]
Chang, Y. Y.; Chen, H. T. Finding good composition in panoramic scenes. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2225–2231, 2009.
[10]
Nishiyama, M.; Okabe, T.; Sato, Y.; Sato, I. Sensation-based photo cropping. In: Proceedings of the 17th ACM International Conference on Multimedia, 669–672, 2009.
DOI
[11]
Ke, Y.; Tang, X. O.; Jing, F. The design of high-level features for photo quality assessment. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 419–426, 2006.
[12]
Dhar, S.; Ordonez, V.; Berg, T. L. High level describable attributes for predicting aesthetics and interestingness. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, 1657–1664, 2011.
DOI
[13]
Chen, L. Q.; Xie, X.; Fan, X.; Ma, W. Y.; Zhang, H. J.; Zhou, H. Q. A visual attention model for adapting images on small displays. Multimedia Systems Vol. 9, No. 4, 353–364, 2003.
[14]
Ge, S. M.; Jin, X.; Ye, Q. T.; Luo, Z.; Li, Q. Image editing by object-aware optimal boundary searching and mixed-domain composition. Computational Visual Media Vol. 4, No. 1, 71–82, 2018.
[15]
Suh, B.; Ling, H. B.; Bederson, B. B.; Jacobs, D. W. Automatic thumbnail cropping and its effectiveness. In: Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology, 95–104, 2003.
DOI
[16]
Zhang, F. L.; Wang, M.; Hu, S. M. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia Vol. 15, No. 7, 1480–1490, 2013.
[17]
Marchesotti, L.; Cifarelli, C.; Csurka, G. A framework for visual saliency detection with applications to image thumbnailing. In: Proceedings of the IEEE 12th International Conference on Computer Vision, 2232–2239, 2009.
DOI
[18]
Xu, P. F.; Ding, J. Q.; Zhang, H.; Huang, H. Discernible image mosaic with edge-aware adaptive tiles. Computational Visual Media Vol. 5, No. 1, 45–58, 2019.
[19]
Zhang, S. H.; Zhou, Z. P.; Liu, B.; Dong, X.; Hall, P. What and where: A context-based recommendation system for object insertion. Computational Visual Media Vol. 6, No. 1, 79–93, 2020.
[20]
Sheng, K. K.; Dong, W. M.; Huang, H. B.; Chai, M. L.; Zhang, Y.; Ma, C. Y.; Hu, B.-G. Learning to assess visual aesthetics of food images. Computational Visual Media Vol. 7, No. 1, 139–152, 2021.
[21]
Luo, J. Subject content-based intelligent cropping of digital photos. In: Proceedings of the IEEE International Conference on Multimedia and Expo, 2218–2221, 2007.
DOI
[22]
Stentiford, F. Attention based auto image cropping. In: Proceedings of the 5th International Conference on Computer Vision Systems, 2007.
[23]
Santella, A.; Agrawala, M.; DeCarlo, D.; Salesin, D.; Cohen, M. Gaze-based interaction for semi-automatic photo cropping. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 771–780, 2006.
DOI
[24]
Cheng, B.; Ni, B. B.; Yan, S. C.; Tian, Q. Learning to photograph. In: Proceedings of the 18th ACM International Conference on Multimedia, 291–300, 2010.
DOI
[25]
Rawat, Y. S.; Kankanhalli, M. S. Context-based photography learning using crowdsourced images and social media. In: Proceedings of the Proceedings of the 22nd ACM International Conference on Multimedia, 217–220, 2014.
DOI
[26]
Yan, J. Z.; Lin, S.; Kang, S. B.; Tang, X. O. Change-based image cropping with exclusion and compositional features. International Journal of Computer Vision Vol. 114, No. 1, 74–87, 2015.
[27]
Liang, Y.; Wang, X. T.; Zhang, S. H.; Hu, S. M.; Liu, S. X. PhotoRecomposer: Interactive photo recomposition by cropping. IEEE Transactions on Visualization and Computer Graphics Vol. 24, No. 10, 2728–2742, 2018.
[28]
Su, H. H.; Chen, T. W.; Kao, C. C.; Hsu, W. H.; Chien, S. Y. Preference-aware view recommendation system for scenic photos based on bag-of-aesthetics-preserving features. IEEE Transactions on Multimedia Vol. 14, No. 3, 833–843, 2012.
[29]
Yan, J. Z.; Lin, S.; Kang, S. B.; Tang, X. O. Learning the change for automatic image cropping.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 971–978, 2013.
DOI
[30]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1733–1740, 2014.
DOI
[31]
Lu, X.; Lin, Z.; Jin, H. L.; Yang, J. C.; Wang, J. Z. RAPID: Rating pictorial aesthetics using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia, 457–466, 2014.
DOI
[32]
Lu, X.; Lin, Z.; Shen, X. H.; Mech, R.; Wang, J. Z. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 990–998, 2015.
DOI
[33]
Kong, S., Shen, X., Lin, Z., Mech, R, Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 662–679, 2016.
DOI
[34]
Mai, L.; Jin, H. L.; Liu, F. Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 497–506, 2016.
DOI
[35]
Esmaeili, S. A.; Singh, B.; Davis, L. S. Fast-at: Fast automatic thumbnail generation using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4178–4186, 2017.
DOI
[36]
Wang, W. G.; Shen, J. B. Deep cropping via attention box prediction and aesthetics assessment. In: Proceedings of the IEEE International Conference on Computer Vision, 2205–2213, 2017.
DOI
[37]
Wang, W. G.; Shen, J. B.; Ling, H. B. A deep network solution for attention and aesthetics aware photo cropping. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 7, 1531–1544, 2019.
[38]
Wang, W. G.; Shen, J. B.; Yu, Y. Z.; Ma, K. L. Stereoscopic thumbnail creation via efficient stereo saliency detection. IEEE Transactions on Visualization and Computer Graphics Vol. 23, No. 8, 2014–2027, 2017.
[39]
Li, D. B.; Wu, H. K.; Zhang, J. G.; Huang, K. Q. A2-RL: Aesthetics aware reinforcement learning for image cropping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8193–8201, 2018.
[40]
Chen, Y. L.; Huang, T. W.; Chang, K. H.; Tsai, Y. C.; Chen, H. T.; Chen, B. Y. Quantitative analysisof automatic image cropping algorithms: A dataset and comparative study. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 226–234, 2017.
DOI
[41]
Hosu, V.; Goldlücke, B.; Saupe, D. Effective aestheticsprediction with multi-level spatially pooled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9375, 2019.
DOI
[42]
Lu, P.; Zhang, H.; Peng, X. J.; Peng, X. Aesthetic guided deep regression network for image cropping. Signal Processing: Image Communication Vol. 77, 1–10, 2019.
[43]
Lu, P.; Zhang, H.; Peng, X. J.; Jin, X. F. An end-to-end neural network for image cropping by learning composition from aesthetic photos. arXiv preprint arXiv: 1907.01432, 2019.
[44]
Li, X. W.; Li, X. M.; Zhang, G.; Zhang, X. L. Image aesthetic assessment using a saliency symbiosis network. Journal of Electronic Imaging Vol. 28, No. 2, 023008, 2019.
[45]
Lu, P.; Liu, J. H.; Peng, X. J.; Wang, X. J. Weakly supervised real-time image cropping based on aestheticdistributions. In: Proceedings of the 28th ACM International Conference on Multimedia, 120–128, 2020.
DOI
[46]
Christensen, C. L.; Vartakavi, A. An experience-based direct generation approach to automatic image cropping. IEEE Access Vol. 9, 107600–107610, 2021.
[47]
Hong, C. Y.; Du, S. Y.; Xian, K.; Lu, H.; Cao, Z. G.; Zhong, W. C. Composing photos like a photographer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7053–7062, 2021.
DOI
[48]
Datta, R.; Joshi, D.; Li, J.; Wang, J. Z. Studying aesthetics in photographic images using a computational approach. In: Computer Vision – ECCV 2006. Lecture Notes in Computer Science, Vol. 3953. Leonardis, A.; Bischof, H.; Pinz, A. Eds. Springer Berlin Heidelberg, 288–301, 2006.
DOI
[49]
Luo, W.; Wang, X. G.; Tang, X. O. Content-based photo quality assessment. In: Proceedings of the International Conference on Computer Vision, 2206–2213, 2011.
DOI
[50]
Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2408–2415, 2012.
DOI
[51]
Fang, C.; Lin, Z.; Mech, R.; Shen, X. H. Automatic image cropping using visual composition, boundary simplicity and content preservation models. In: Proceedings of the 22nd ACM International Conference on Multimedia, 1105–1108, 2014.
DOI
[52]
Chang, H. W.; Yu, F.; Wang, J.; Ashley, D.; Finkelstein, A. Automatic triage for a photo series. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 148, 2016.
[53]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
DOI
[54]
He, K. M.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2980–2988, 2017.
[55]
MacQueen, J. Some methods for classification andanalysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, 281–297, 1967.
[56]
DeGroot, M.; Brown, E. SSD: Single shot multibox object detector, in PyTorch. 2018. Available at https://github.com/amdegroot/ssd.pytorch.
[57]
Everingham, M.; van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. 2007. Available at http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Video
41095_0263_ESM2.mp4
41095_0263_ESM3.mp4
File
41095_0263_ESM1.pdf (17.2 MB)
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 17 August 2021
Accepted: 01 November 2021
Published: 18 October 2022
Issue date: March 2023

Copyright

© The Author(s) 2022.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61521002, 62132012) and the Marsden Fund Council managed by the Royal Society of New Zealand (MFP-20-VUW-180).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return