Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing 100124, China
Faculty of Management, University of New Brunswick, Fredericton E3B 5A3, Canada
School of Mathematics and Statistics, Shandong Normal University, Jinan 250014, China
Show Author Information
Hide Author Information
Abstract
We investigate the problem of maximizing the sum of submodular and supermodular functions under a fairness constraint. This sum function is non-submodular in general. For an offline model, we introduce two approximation algorithms: A greedy algorithm and a threshold greedy algorithm. For a streaming model, we propose a one-pass streaming algorithm. We also analyze the approximation ratios of these algorithms, which all depend on the total curvature of the supermodular function. The total curvature is computable in polynomial time and widely utilized in the literature.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
G. L.Nemhauser, L. A.Wolsey, and M. L.Fisher, An analysis of approximations for maximizing submodular set functions-I, Math. Program., vol. 14, no. 1, pp. 265–294, 1978.
Y.Filmus and J.Ward, Monotone submodular maximization over a matroid via non-oblivious local search, SIAM J. Comput., vol. 43, no. 2, pp. 514–542, 2014.
G.Calinescu, C.Chekuri, M.Pál, and J.Vondrák, Maximizing a monotone submodular function subject to a matroid constraint, SIAM J. Comput., vol. 40, no. 6, pp. 1740–1766, 2011.
E.Balkanski, A.Rubinstein, and Y.Singer, An exponential speedup in parallel running time for submodular maximization without loss in approximation, in Proc. 30th Annu. ACM-SIAM Symp. on Discrete Algorithms, San Diego, CA, USA, 2019, pp. 283–302.
A.Badanidiyuru, B.Mirzasoleiman, A.Karbasi, and A.Krause, Streaming submodular maximization: Massive data summarization on the fly, in Proc. 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 671–680.
Y.Yoshida, Maximizing a monotone submodular function with a bounded curvature under a knapsack constraint, SIAM J. Discrete Mathem., vol. 33, no. 3, pp. 1452–1471, 2019.
M.El Halabi, S.Mitrovic, A.Norouzi-Fard, J.Tardos, and J.Tarnawski, Fairness in streaming submodular maximization: Algorithms and hardness, in Proc. 34th Conf. on Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 13609–13622.
Y. H.Wang, F.Fabbri, and M.Mathioudakis, Streaming submodular maximization with fairness constraints, in Proc. the Web Conf. 2021, Ljubljana, Slovenia, 2021, pp. 1340–1350.
W. R.Bai and J.Bilmes, Greed is still good: Maximizing monotone submodular+supermodular (BP) functions, in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmas̈san, Stockholm, Sweden, 2018, pp. 304–313.
A. A.Bian, J. M.Buhmann, A.Krause, and S.Tschiatschek, Guarantees for greedy maximization of non-submodular functions with applications, in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, 2017, pp. 498–507.
A.Kuhnle, J. D.Smith, V. G.Crawford, and M. T.Thai, Fast maximization of non-submodular, monotonic functions on the integer lattice, in Proc. 35th Int. Conf. on Machine Learning, Stockholmsmas̈san, Sweden, 2018, pp. 2791–2800.
M.Conforti and G.Cornuéjols, Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem, Discrete Appl. Math., vol. 7, no. 3, pp. 251–274, 1984.
S.Corbett-Davies, E.Pierson, A.Feller, S.Goel, and A.Huq, Algorithmic decision making and the cost of fairness, in Proc. 23rd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 797–806.
Zhang Z, Meng K, Du D, et al. Maximizing Submodular+Supermodular Functions Subject to a Fairness Constraint. Tsinghua Science and Technology, 2024, 29(1): 46-55. https://doi.org/10.26599/TST.2022.9010013
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Greedy Algorithm
First, we use the greedy algorithm to investigate Formula (
1
). In every step, the algorithm enumerates all the elements in the remaining ground set to find the maximum one with the largest marginal increment. Then, the subset to which the selected element belongs is checked. If the constraint of the subset has not reached the capacity, the element is added to the solution set, and is removed away from the ground set. If the constraint of the subset has already reached the capacity, the element cannot be added to the solution set and we delete all the elements in this subset from the ground set. This procedure is repeated until the remaining ground set is empty. Meanwhile, all the constraints are tight. Algorithm 1 gives the pseudo code of this idea.
To analyze the approximation ratio of Algorithm 1, we introduce some notations. Let denote an optimal solution set of Formula (
1
) and be the optimal solution in the subset
Theorem 1 The performance guarantee of Algorithm 1 is The inquiry complexity of Algorithm 1 is
Proof The final solution obtained by Algorithm 1 is divided into according to the subsets . Given that , we construct bijections from to For , the bijection maps to itself. The subset is labeled in the order of the elements added by Algorithm 1. The set is ordered arbitrary. For , the bijections are . Let denote the initial solution of the iteration in which the element is added. According to Algorithm 1, the equation holds. Thus, we have
By rearrangement, the above inequality can be written as follows:
The algorithm needs to query the marginal increment of at most elements in each round, and enumerates rounds. Thus, we obtain the conclusions in Theorem 1.
■
Remark 1 If then is a modular function. At this point, Algorithm 1 is guaranteed by 1/2.
Threshold Greedy Algorithm
In this section, to reduce the inquiry complexity, we propose the threshold greedy algorithm for Formula (
1
). The main idea is as follows: when the marginal increment of an element for the current solution is larger than some threshold, it will be added to the solution. Intuitively, the threshold is related to the OPTimal value (OPT). Moreover, the threshold can be considered as one over of OPT. The OPT can be estimated by using Formula (
3
),
where is the maximum value of any singleton element. Thus, the one over of OPT falls in the interval , where is a small positive constant. We reduce from by the order of times the geometric progression , and consider all these values as thresholds. For each threshold, we enumerate all elements in the ground set. If the marginal increment of an element is larger than the threshold and the corresponding constraint has not reached its upper bound, then the element is added to the solution. More details are shown in Algorithm 2.
To analyze the approximation ratio of the algorithm, we consider the upper bound of the marginal increment for an element which is in the optimal solution but not in the current solution. Suppose that Algorithm 2 is terminated as the threshold lower than . Thus, the solution is updated at most times. Let denote the final step of the iteration. We then have the following lemma.
Lemma 2 Let be processed at the -th iteration in Algorithm 2 and with be the initial solution for the -th iteration. is the solution after the -th iteration. For any the relationship between the increment of to and the gain at the -th iteration is given as follows:
where and are in the same partition .
Proof First, suppose that the threshold is . Thus, the increment is
where .
Second, for , suppose that the last threshold is and is the last feasible solution with respect to the threshold . Then, all the elements in the ground set have been read at least once. For any , let be the initial solution for the iteration to consider the element . As has not been added, we therefore have
From Formula (
3
), we have
where . Meanwhile, from Algorithm 2, at the -th iteration, we have
Theorem 2 Algorithm 2 is guaranteed by The inquiry complexity of Algorithm 2 is .
Proof From the algorithm, the final output solution satisfies or Let be the partition of based on
(1) For the case , each constraint satisfies . We first label the elements in the set according to the order of the element added. As , we construct bijections from to similar as the bijections in the proof of Theorem 1. As a result, we obtain the following:
Let be the initial solution of the iteration in which is added. Thus, we have
The first inequality holds by monotonicity, and the second and the third inequalities hold via Formula (
3
) and Lemma 4, respectively. By rearrangement, we obtain
(2) In the case , some constraints satisfy , and others satisfy . We label the index sets and .
According to Algorithm 2, we have
for any with . Thus, we get
For the case of with , we construct bijections similar to Eq. (
20
). Thus, we have
Combining Formulas (
24
) and (
25
), we attain in the following:
By rearrangement, we obtain
Combining Formulas (
22
) and (
26
), we achieve the conclusion in Theorem 2.
■
Remark 2 When is generated to a submodular function. Thus, Algorithm 2 becomes a -approximation algorithm.
Streaming Algorithm
Finally, we attempt to investigate Formula (
1
) with a big data ground set. We cannot store all data information and calculate them simultaneously. We need a streaming algorithm. Streaming algorithms consider the data one by one, and before the next element reveals, it must make a decision for the current one. Four indexes are used to measure the performance of a streaming algorithm. The approximation ratio and time complexity are formal indexes. The other two are memory complexity and the number of passes of reading the total data. We aspire to attain a one-pass streaming algorithm with a reasonable approximation ratio, time complexity and memory complexity.
Similar to the threshold greedy algorithm, if the marginal increment of the current element is larger than the threshold, then it is added to the solution. We subdivide the estimation interval of the OPT by geometric series, and consider all these values as the threshold. By Formula (
15
), the bound of the OPT is as follows:
where .
We subdivide the interval and obtain a set in the following:
Suppose that Then, we attain . Let . Evidently, we have . Meanwhile, Therefore, the parameter is contained in with . can be considered as an estimation of the OPT.
As the element is revealed one by one in streaming algorithms, we cannot obtain the maximum value of a singleton element in advance. Meanwhile, the feasible solution is updated with the new arriving element. Thus, the estimation interval and the parameter set are updated. Finally, the set desired is exactly with
Another difficulty is that is the total curvature of the supermodular function , whose computation needs the entire information of the data set. Thus, we cannot receive it in advance. Given that , we refine the interval [0, 1) into copies to simulate the value . The more copies of the interval are subdivided, the higher accuracy of the solution can be achieved. Finally, the algorithm returns a solution with the largest value among all these and in . The pseudo code of the one-pass streaming algorithm is presented in Algorithm 3.
In the following lemma, we consider the properties of the solution corresponding to in the set , where is the set of all final solutions with respect to the parameters returned by the streaming processing and . The constraints of may result in (1) for all , or (2) for all or (3) for some and for others. One of these three cases may occur.
Lemma 3 For the parameter with if the cardinality of is , or for all we have
Proof By mathematical induction, we get
For the case of , it is obvious that
For the case that for all we have
where the third inequality holds by Algorithm 3.
By rearrangement, we get
From the above, we receive the conclusions in Lemma 5.
■
However, if the constraints of are for some and for others, we cannot achieve the quality of . Thus, Algorithm 3 employs a post-processing to improve the solution quality and help us achieve an approximation ratio of the algorithm. We keep some elements during the streaming processing and store them in set . The marginal increments of these elements are less than the threshold but no less than . Set is considered as the ground set in the post-processing. The solutions based on the special parameter are considered. Two cases are investigated. One is when is assigned as the minimum value in set for all if . If , then is the maximum value in set . We consider the properties of the final solution whose initial solution is in the post-processing.
Theorem 3 Algorithm 3 is guaranteed by , the update time per element in streaming processing is and the running time for the post-processing is
Proof We first consider the case that for all
In this case, we divide into two sets: in which the optimal elements are in the set , but not in the final solution ; , in which the optimal elements are deleted directly during streaming processing.
For any , we have
where is the initial solution when is added, and and are in the same patition. Formula (
29
) holds both in streaming processing and in post processing (the idea of greedy .
Meanwhile, as the element is deleted during streaming processing, its marginal increment to the corresponding solution is less than . Thus, for any , we achieve
Combining Formulas (
29
) and (
30
), we have
where is the initial solution when is added.
By rearrangement, we obtain
Second, we investigate the case in which is the maximum parameter in . Notably, . In such situation, for any solution , there are at least one element in the set .
In this case, is divided into three sets: in which the optimal elements are deleted in the streaming process since their corresponding partitions have already reached the capacities, where in which the optimal elements are in the set but not in the final solution ; in which the optimal elements are deleted directly during streaming processing thought the constraints of their corresponding partitions are not tight. However, their marginal increments are less than
For , we have
where is in the same partition with and is the initial solution when is added. Collect all these in the set , .
For and , the relations are similar with Formulas (
29
) and (
30
).
Thus, we obtain the calculation as follows:
By rearrangement, we have
Combining Formulas (
32
) and (
33
), we obtain the conclusion in Theorem 5.
From , the cardinality of the set is . Therefore, in the streaming processing the memory complexity is for each . At most elements are stored for the post-processing. As we have to check values of , the total memory of Algorithm 3 is In the streaming process, the inquiry complexity for each element is formed by the number of parameters in set , that is . In the post-processing, the algorithm enumerates at most iterations for elements with each initial solution. Thus, post-processing needs oracle queries for each .