Journal Home > Volume 2 , Issue 1

Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.


menu
Abstract
Full text
Outline
About this article

Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in China

Show Author's information Shichang DingXin GaoYufan DongYiwei TongXiaoming Fu( )
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 276800, China
Department of Sociology, Tsinghua University, Beijing 100085, China
Institute of Computer Science, University of Göttingen, Göttingen 37077, Germany
Shanghai Hejin Information Technology Company, Shanghai 200100, China

Abstract

Inferring people’s Socioeconomic Attributes (SEAs), including income, occupation, and education level, is an important problem for both social sciences and many networked applications like targeted advertising and personalized recommendation. Previous works mainly focus on estimating SEAs from peoples’ cyberspace behaviors and relationships, such as the content of tweets or the social networks between online users. Besides cyberspace data, alternative data sources about users’ physical behavior, like their home location, may offer new insights. More specifically, in this paper, we study how to predict a person’s income level, family income level, occupation type, and education level from his/her home location. As a case study, we collect people’s home locations and socioeconomic attributes through a survey involving 9 provinces and 85 cities in China. We further enrich home location with the knowledge from real estate websites, government statistics websites, online map services, etc. To learn a shared representation from input features as well as attribute-specific representations for different SEAs, we propose H2SEA, a factorization machine-based multi-task learning method with attention mechanism. Extensive experiment results show that: (1) Home location can clearly improve the estimation accuracy for all SEA prediction tasks (e.g., 80.2% improvement in terms of F1-score in estimating personal income level); (2) The proposed H2SEA model outperforms alternative models for SEA inference in terms of various evaluation metrics, such as Area Under Curve (AUC), F-measure, and specificity; (3) The performance of specific SEA prediction tasks (e.g., personal income) can be further improved if H2SEA only focuses on cities or villages due to urban-rural gap in China; (4) Compared with online crawled housing price data, the area-level average income and Points Of Interest (POI) are more important features for SEA inferences in China.

Keywords: personal income, family income, occupation, education, multi-task learning

References(65)

[1]
N. Aletras and B. Chamberlain, Predicting twitter user socioeconomic attributes with network and language information, in Proc. of the 29th on Hypertext and Social Media, Baltimore, MD, USA, 2018, pp. 20-24.
DOI
[2]
R. Bradley and R. Corwyn, Socioeconomic status and child development, Annual Review of Psychology, vol. 53, no. 1, pp. 371-399, 2002.
[3]
S. Sirin, Socioeconomic status and academic achievement: A meta-analytic review of research, Review of Educational Research, vol. 75, no. 3, pp. 417-453, 2005.
[4]
T. Szopiński, Factors affecting the adoption of online banking in poland, Journal of Business Research, vol. 69, no. 11, pp. 4763-4768, 2016.
[5]
D. Chen, D. Jin, T. Goh, N. Li, and L. Wei, Context-awareness-based personalized recommendation of anti-hypertension drugs, Journal of Medical Systems, vol. 40, no. 9, p. 202, 2016.
[6]
L. Hung, A personalized recommendation system based on product taxonomy for one-to-one marketing online, Expert Systems with Applications, vol. 29, no. 2, pp. 383-392, 2005.
[7]
Y. Wu, N. Carnt, and F. Stapleton, Contact lens user profile, attitudes and level of compliance to lens care, Contact Lens and Anterior Eye, vol. 33, no. 4, pp. 183-188, 2010.
[8]
P. Wang, J. Guo, Y. Lan, J. Xu, and X. Cheng, Your cart tells you: Inferring demographic attributes from purchase data, in Proc. of the 9th ACM Int. Conf. on Web Search and Data Mining, San Francisco, CA, USA, 2016, pp. 173-182.
DOI
[9]
V. Soto, V. Frias-Martinez, J. Virseda, and E. Frias-Martinez, Prediction of socioeconomic levels using cell phone records, in Proc. of Int. Conf. on User Modeling, Adaptation, and Personalization, Girona, Spain, 2011, pp. 377-388.
DOI
[10]
J. Blumenstock, G. Cadamuro, and R. On, Predicting poverty and wealth from mobile phone metadata, Science, vol. 350, no. 6264, pp. 1073-1076, 2015.
[11]
A. Almaatouq, F. Prieto-Castrillo, and A. Pentland, Mobile communication signatures of unemployment, in Proc. of Int. Conf. on Social Informatics, Bellevue, WA, USA, 2016, pp. 407-418.
[12]
Y. Xu, A. Belyi, I. Bojic, and C. Ratti, Human mobility and socioeconomic status: Analysis of Singapore and Boston, Computers, Environment and Urban Systems, vol. 72, pp. 51-67, 2018.
[13]
D. Preoţiuc-Pietro, V. Lampos, and N. Aletras, An analysis of the user occupational class through twitter content, in Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. on Natural Language Processing, Beijing, China, 2015, pp. 1754-1764.
[14]
D. Preoţiuc-Pietro, S. Volkova, V. Lampos, Y. Bachrach, and N. Aletras, Studying user income through language, behavior and affect in social media, PloS One, vol. 10, no. 9, p. e0138717, 2015.
[15]
V. Lampos, N. Aletras, J. Geyti, B. Zou, and I. Cox, Inferring the socioeconomic status of social media users based on behavior and language, in Proc. of European Conf. on Information Retrieval, Padua, Italy, 2016, pp. 689-695.
[16]
M. Oyamada and S. Nakadai, Relational mixture of experts: Explainable demographics prediction with behavioral data, in Proc. of 2017 IEEE Int. Conf. on Data Mining (ICDM), New Orleans, LA, USA, 2017, pp. 357-366.
[17]
S. C. Ding, H. Huang, T. Zhao, and X. M. Fu, Estimating socioeconomic status via temporal-spatial mobility analysis—A case study of smart card data, in Proc. of 28th Int. Conf. on Computer Communication and Networks, ICCCN 2019, Valencia, Spain, 2019, pp. 1-9.
[18]
A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. Smola, Scalable distributed inference of dynamic user interests for behavioral targeting, in Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, USA, 2011, pp. 114-122.
[19]
M. C. Gonzalez, C. A. Hidalgo, and A. L. Barabasi, Understanding individual human mobility patterns, Nature, vol. 453, no. 7196, pp. 779-782, 2008.
[20]
C. Huang and D. Wang, Unsupervised interesting places discovery in location-based social sensing, in Proc. of 2016 Int. Conf. on Distributed Computing in Sensor Systems (DCOSS), Washington, DC, USA, 2016, pp. 67-74.
[21]
B. Srilakshmi and K. S. Kumar, An efficient and scalable location-aware recommender system, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 6, pp. 1384-1399, 2014.
[22]
J. Goering, A. Kamely, and T. Richardson, Recent research on racial segregation and poverty concentration in public housing in the united states, Urban Affairs Review, vol. 32, no. 5, pp. 723-745, 1997.
[23]
P. Bqjari and M. E. Kahn, Estimating housing demand with an application to explaining racial segregation in cities, Operations Research, vol. 45, no.4, pp. 419-422, 2005.
[24]
B. Qin and Y. L. Jiao, Housing price distribution and urban spatial restructuring in Beijing, (in Chinese), Economic Geography, vol. 30, no. 11, pp. 1815-1820, 2010.
[25]
J. Chang, Research of spatial distribution and driving mechanism of housing price in Dalian city, (in Chinese), Journal of Liaoning Normal University: Natural Science Edition, vol. 33, no. 4, pp. 503-506, 2010.
[26]
Y. N. Shih, H. C. Li, and B. Qin, Housing price bubbles and inter-provincial spillover: Evidence from China, Habitat Int., vol. 43, no. 4, pp. 142-151, 2014.
[27]
X. J. Song, H. Wei, and L. Wang, Research of spatial structure and differentiation pattern of housing price in Xi’an based on esda and geostatistical analysis, (in Chinese), Science of Surveying and Mapping, vol. 36, no. 2, pp. 171-174, 2011.
[28]
Z. X. Zhao, Q. Xu, S. Peng, and L. Hong, Analyzing spatial-temporal patterns of house price based on network big data in the main city zone of Kunming, in Proc. of the 2020 Artificial Intelligence and Complex Systems Conf., Wuhan, China, 2020, pp. 5-10.
[29]
Y. Wang, D. L. Wang, and S. J. Wang, Spatial differentiation patterns and impact factors of housing prices of China’s cities, (in Chinese), Scientia Geographica Sinica, vol. 10, pp. 1157-1165, 2013.
[30]
Y. Wang, Q. Li, S. J. Wang, and J. Qin, Determinants and dynamics of spatial differentiation of housing price in Yangzhou, (in Chinese), Progress in Geography, vol. 68, no. 8, pp. 1082-1096 2013.
[31]
J. Gao, C. Zhou, and C. Ye, The equitable distribution of public services in Guangzhou, Planners, vol. 26, no. 4, pp. 12-18, 2010.
[32]
Z. Feng and M. Zhen, The spatial distribution of commodity housing and price in Nanjing based on the spatial analysis, (in Chinese), Modern Urban Research, vol. 7, pp. 47-53, 2008.
[33]
F. L. Xu, T. Xia, H. C. Cao, Y. Li, F. N. Sun, and F. C. Meng, Detecting popular temporal modes in population-scale unlabelled trajectory data, in Proc. of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 2, no. 1, pp. 46:1-46:25, 2018.
[34]
R. G. Guo, Practical Sampling Audit, (in Chinese). Beijing, China: China Audit Press, 1990.
[35]
J. L. Abitbol, M. Karsai, and E. Fleury, Location, occupation and semantics based socioeconomic status inference on twitter, in Proc. of 2018 IEEE Int. Conf. on Data Mining Workshops (ICDMW), Singapore, 2018, pp. 1192-1199.
[36]
N. J. Yuan, Y. Zheng, X. Xie, Y. Z. Wang, K. Zheng, and H. Xiong, Discovering urban functional zones using latent activity trajectories, IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 712-725, 2015.
[37]
T. Sicular, X. Yue, B. Gustafsson, and S. Li, The urban-rural income gap and inequality in China, Review of Income and Wealth, vol. 53, no. 1, pp. 93-126, 2007.
[38]
S. Rendle, Factorization machines with libFM, ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 3, pp. 1-22, 2012.
[39]
F. Wu, X. H. Yang, A. Packard, and G. Becker, Induced l2-norm control for LPV systems with bounded parameter variation rates, Int. Journal of Robust and Nonlinear Control, vol. 6, nos. 9&10, pp. 983-998, 1996.
[40]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[41]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014.
[42]
T. Q. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proc. of the 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 785-794.
[43]
H. Huang, B. Zhao, H. Zhao, Z. Zhuang, Z. W. Wang, X. M. Yao, X. G. Wang, H. Jin, and X. M. Fu, A cross-platform consumer behavior analysis of large-scale mobile shopping data, in Proc. of the 2018 World Wide Web Conf. on World Wide Web, Lyon, France, 2018, pp. 1785-1794.
[44]
G. Q. Chen and C. L. Luo, Analysis on factors to impact proportion of China’s urban and rural residents’ per capita income in GDP—Based on perspective of time and region, (in Chinese), Journal of Beijing Technology and Business University (Social Sciences), vol. 30, no. 5, pp. 116-126, 2015.
[45]
T. Sicular, X. M. Yue, B. Gustafsson, and S. Li, The urban-rural income gap and inequality in China, Review of Income and Wealth, vol. 53, no. 1, pp. 93-126, 2010.
[46]
X. L. Qian and R. Smyth, Measuring regional inequality of education in China: Widening coast-inland gap or widening rural-urban gap? Journal of Int. Development, vol. 20, p. 2, 2010.
[47]
A. G. Walder and X. B. He, Public housing into private assets: Wealth creation in urban China, Social Science Research, vol. 46, pp. 85-99, 2014.
[48]
S. M. Li, China’s housing reform and outcomes, Housing Studies, vol. 27, no. 8, pp. 1-2, 2012.
[49]
Y. P. Wang and A. Murie, Commercial housing development in urban China, Urban Studies, vol. 36, no. 9, p. 1475, 1999.
[50]
X. Zhang, J. Wang, M. P. Kwan, and Y. W. Chai, Reside nearby, behave apart? Activity-space-based segregation among residents of various types of housing in Beijing, China, Cities, vol. 88, pp. 166-180, 2019.
[51]
X. T. Cao and P. H. Liao, Discussion on the issues of the resettlement compensation policy for landless peasant under the background of coordinated urban and rural development, (in Chinese), Journal of Anhui Agricultural Science, vol. 40, no. 14, pp. 8360, 8361&8363, 2012.
[52]
J. Blumenstock, Estimating economic characteristics with phone data, AEA Papers and Proc., vol. 108, pp. 72-76, 2018.
[53]
M. Fixman, A. Berenstein, J. Brea, M. Minnoni, M. Travizano, and Carlos Sarraute, A Bayesian approach to income inference in a communication network, in Proc. of 2016 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 2016, pp. 579-582.
[54]
S. Volkova and Y. Bachrach, Inferring perceived demographics from user emotional tone and user-environment emotional contrast, in Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016, pp. 1567-1578.
[55]
P. Sundsøy, J. Bjelland, B. A. Reme, A. M. Iqbal, and E. Jahani, Deep learning applied to mobile phone data for individual income classification, in Proc. of 2016 Int. Conf. on Artificial Intelligence: Technologies and Applications, Bangkok, Thailand, 2016, pp. 96-100.
[56]
M. Hasanuzzaman, S. Kamila, M. Kaur, S. Saha, and A. Ekbal, Temporal orientation of tweets for predicting income of users, in Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada, 2017, pp. 659-665.
[57]
P. Atahan, Learning profiles from user interactions, master thesis, The University of Texas at Dallas, Dallas, TX, USA, 2009.
[58]
S. Volkova and Y. Bachrach, On predicting sociode mographic traits and emotions from communications in social networks and their implications to online self-disclosure, Cyberpsychology, Behavior and Social Networking, vol. 18, no. 12, pp. 726-736, 2015.
[59]
S. Volkova, Y. Bachrach, M. Armstrong, and V. Sharma, Inferring latent user properties from texts published in social media, in Proc. of 29th AAAI Conf. on Artificial Intelligence, Austin, TX, USA, 2015, pp. 4296-4297.
[60]
G. R. Borges, J. M. Almeida, and G. L. Pappa, Inferring user social class in online social networks, in Proc. of the 8th Workshop on Social Network Mining and Analysis, New York, NY, USA, 2014, p. 10.
[61]
S. C. Matz, J. I. Menges, D. J. Stillwell, and H. A. Schwartz, Predicting individual-level income from facebook profiles, PloS One, vol. 14, no. 3, p. e0214369, 2019.
[62]
Y. L. Ren, M. Tomko, F. D. Salim, J. Chan, and M. Sanderson, Understanding the predictability of user demographics from cyber-physical-social behaviours in indoor retail spaces, EPJ Data Science, vol. 7, no. 1, p. 1, 2018.
[63]
Y. D. Zhu, F. Chen, M. Li, and Z. J. Wang, Inferring the economic attributes of urban rail transit passengers based on individual mobility using multisource data, Sustainability, vol. 10, no. 11, p. 4178, 2018.
[64]
Y. Zhang and Q. Yang, A survey on multi-task learning, arXiv preprint arXiv:1707.08114, 2017.
[65]
R. Kim, H. Kim, J. Lee, and J. Kang, Predicting multiple demographic attributes with task specific embedding transformation and attention network, in Proc. of the 2019 SIAM Int. Conf. on Data Mining, Calgary, Canada, 2019, pp. 765-773.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 18 December 2020
Accepted: 18 January 2021
Published: 16 February 2021
Issue date: March 2021

Copyright

© The author(s) 2021

Acknowledgements

The research work was partly funded by the European Union’s Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie (No. 824019), and the Tsinghua-Göttingen Student Exchange Project (No. IDS-SSP-2017001).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return