W. P. Wang, Z. R. Wang, Z. F. Zhou, H. X. Deng, W. L. Zhao, C. Y. Wang, and Y. Z. Guo, Anomaly detection of industrial control systems based on transfer learning, Tsinghua Science and Technology, vol. 26, no. 6, pp. 821–832, 2021.
Z. N. Mohammad, F. Farha, A. O. M. Abuassba, S. K. Yang, and F. Zhou, Access control and authorization in smart homes: A survey, Tsinghua Science and Technology, vol. 26, no. 6, pp. 906–917, 2021.
X. L. Xu, H. Y. Li, W. J. Xu, Z. J. Liu, L. Yao, and F. Dai, Artificial intelligence for edge service optimization in Internet of Vehicles: A survey, Tsinghua Science and Technology, vol. 27, no. 2, pp. 270–287, 2022.
M. S. Ali, M. Vecchio, M. Pincheira, K. Dolui, F. Antonelli, and M. H. Rehmani, Applications of blockchains in the internet of things: A comprehensive survey, IEEE Commun. Surv. Tutorials, vol. 21, no. 2, pp. 1676–1717, 2019.
K. Biswas and V. Muthukkumarasamy, Securing smart cities using blockchain technology, in Proc. 18th Int. Conf. on High Performance Computing and Communications; IEEE 14th Int. Conf. on Smart City; IEEE 2nd Int. Conf. on Data Science and Systems, Sydney, Australia, 2016, pp. 1392–1393.
P. T. S. Liu, Medical record system using blockchain, big data and tokenization, in Proc. 18th Int. Conf. on Information and Communications Security, Singapore, 2016, pp. 254–261.
X. Yue, H. J. Wang, D. W. Jin, M. Q. Li, and W. Jiang, Healthcare data gateways: Found healthcare intelligence on blockchain with novel privacy risk control, J. Med. Syst., vol. 40, no. 10, p. 218, 2016.
B. Fisch, J. Bonnerau, N. Greco, and J. Benet, Scaling proof-of-replication for filecoin mining, Technical report, Stanford University, Palo Alto, CA, USA, https://research.protocol.ai/publications/scaling-proof-of-replication-for-filecoin-mining/, 2018.
I. Bentov, C. Lee, A. Mizrahi, and M. Rosenfeld, Proof of activity: Extending bitcoin’s proof of work via proof of stake, SIGMETRICS Perform. Eval. Rev., vol. 42, no. 3, pp. 34–37, 2014.
M. Castro and B. Liskov, Practical Byzantine fault tolerance, in Proc. 3rd Symp. on Operating Systems Design and Implementation, New Orleans, LA, USA, 1999, pp. 173–186.
M. F. Yin, D. Malkhi, M. K. Reiter, G. G. Gueta, and I. Abraham, HotStuff: BFT consensus with linearity and responsiveness, in Proc. 2019 ACM Symp. on Principles of Distributed Computing, Toronto, Canada, 2019, pp. 347–356.
Y. F. Zou, M. H. Xu, J. G. Yu, F. Zhao, and X. Z. Cheng, A fast consensus for permissioned wireless blockchains, IEEE Internet Things J., .
M. H. Xu, C. C. Liu, Y. F. Zou, F. Zhao, J. G. Yu, and X. Z. Cheng, wChain: A fast fault-tolerant blockchain protocol for multihop wireless networks, IEEE Trans. Wirel. Commun., vol. 20, no. 10, pp. 6915–6926, 2021.
L. Yang, Y. F. Zou, M. H. Xu, Y. C. Xu, D. X. Yu, and X. Z. Cheng, Distributed consensus for blockchains in internet-of-things networks, Tsinghua Science and Technology, vol. 27, no. 5, pp. 817–831, 2022.
M. H. Xu, F. Zhao, Y. F. Zou, C. C. Liu, X. Z. Cheng, and F. Dressler, BLOWN: A blockchain protocol for single-hop wireless networks under adversarial SINR, IEEE Trans. Mob. Comput., .
R. S. Sutton, Temporal credit assignment in reinforcement learning, PhD dissertation, Univ. Mass. Amherst, Amherst, MA, USA, 1984.
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, in Proc. 12th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1057–1063.
A. G. Barto, R. S. Sutton, and C. W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Trans. Syst. Man Cybern., vol. SMC-13, no. 5, pp. 834–846, 1983.
S. Dziembowski, S. Faust, V. Kolmogorov, and K. Pietrzak, Proofs of space, in Proc. 35th Annu. Cryptology Conf., Santa Barbara, CA, USA, 2015, pp. 585–605.
A. Miller, A. Juels, E. Shi, B. Parno, and J. Katz, Permacoin: Repurposing bitcoin work for data preservation, in Proc. 2014 IEEE Symp. on Security and Privacy, Berkeley, CA, USA, 2014, pp. 475–490.
M. Tan, Multi-agent reinforcement learning: Independent vs. cooperative agents, in Machine Learning Proceedings 1993. Amsterdam, the Netherlands: Elsevier, 1993, pp. 330–337.
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al., Value-decomposition networks for cooperative multi-agent learning based on team reward, in Proc. 17th Int. Conf. on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden, 2017, pp. 2085–2087.
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning, in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 2018, pp. 4292–4301.
K. Q. Zhang, Z. R. Yang, and T. Başar, Multi-agent reinforcement learning: A selective overview of theories and algorithms, in Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, eds. Cham, Germany: Springer, 2021, pp. 321–384.
J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, Counterfactual multi-agent policy gradients, in Proc. 32nd AAAI Conf. on Artificial Intelligence, Palo Alto, CA, USA, 2018, pp. 2974–2982.
R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6382–6393.
J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. S. Torr, P. Kohli, and S. Whiteson, Stabilising experience replay for deep multi-agent reinforcement learning, in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 1146–1155.
A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, Multiagent cooperation and competition with deep reinforcement learning, PLoS One, vol. 12, no. 4, p. e0172395, 2017.
A. Lazaridou, A. Peysakhovich, and M. Baroni, Multi-agent cooperation and the emergence of (natural) language, arXiv preprint arXiv: 1612.07182, 2017.
I. Mordatch and P. Abbeel, Emergence of grounded compositional language in multi-agent populations, in Proc. 32nd AAAI Conf. on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and 8th AAAI Symp. on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, p. 183.
T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, and I. Mordatch, Emergent complexity via multi-agent competition, present at Proc. 6th Int. Conf. on Learning Representations, Vancouver, Canada, 2018.
M. Raghu, A. Irpan, J. Andreas, R. Kleinberg, Q. V. Le, and J. M. Kleinberg, Can deep reinforcement learning solve Erdos-Selfridge-spencer games? in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4235–4243.
J. Z. Leibo, V. Zambaldi, M. Lanctot, and J. Marecki, Multi-agent reinforcement learning in sequential social dilemmas, in Proc. 16th Conf. on Autonomous Agents and MultiAgent Systems, São Paulo, Brazil, 2017, pp. 464–473.
A. Lerer and A. Peysakhovich, Maintaining cooperation in complex social dilemmas using deep reinforcement learning, arXiv preprint arXiv: 1707.01068, 2018.
J. Z. Leibo, J. Perolat, E. Hughes, S. Wheelwright, A. H. Marblestone, E. Duéñez-Guzmán, P. Sunehag, I. Dunning, and T. Graepel, Malthusian reinforcement learning, in Proc. 18th Int. Conf. on Autonomous Agents and MultiAgent Systems, Montreal, Canada, 2019, pp. 1099–1107.
Y. C. Ho, Team decision theory and information structures, Proc. IEEE, vol. 68, no. 6, pp. 644–654, 1980.
X. F. Wang and T. Sandholm, Reinforcement learning to play an optimal Nash equilibrium in team Markov games, in Proc. 15th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1603–1610.
T. Yoshikawa, Decomposition of dynamic team decision problems, IEEE Trans. Autom. Control, vol. 23, no. 4, pp. 627–632, 1978.
M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirshpp, eds. Amsterdam, the Netherlands: Elsevier, 1994, pp. 157–163.
J. L. Hu and M. P. Wellman, Nash q-learning for general-sum stochastic games, J. Mach. Learn. Res., vol. 4, pp. 1039–1069, 2003.
M. G. Lagoudakis and R. Parr, Learning in zero-sum team Markov games using factored value functions, in Proc. 15th Int. Conf. on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 1659–1666.
M. L. Littman, Friend-or-foe Q-learning in general-sum games, in Proc. 18th Int. Conf. on Machine Learning, San Francisco, CA, USA, 2001, pp. 322–328.
C. Dwork, N. Lynch, and L. Stockmeyer, Consensus in the presence of partial synchrony (Preliminary version), in Proc. 3rd Annu. ACM Symp. on Principles of Distributed Computing, Vancouver British Columbia, Canada, 1984, pp. 103–118.
P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes: Implementation issues, in Proc. 38th IEEE Conf. on Decision and Control, Phoenix, AZ, USA, 1999, pp. 1769–1774.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., vol. 8, nos. 3&4, pp. 229–256, 1992.
R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, in Proc. 8th Int. Conf. on Neural Information Processing Systems, Denver, CO, USA, 1995, pp. 1038–1044.