{"title": "Smoothing Structured Decomposable Circuits", "book": "Advances in Neural Information Processing Systems", "page_first": 11416, "page_last": 11426, "abstract": "We study the task of smoothing a circuit, i.e., ensuring that all children of a plus-gate mention the same variables. Circuits serve as the building blocks of state-of-the-art inference algorithms on discrete probabilistic graphical models and probabilistic programs. They are also important for discrete density estimation algorithms. Many of these tasks require the input circuit to be smooth. However, smoothing has not been studied in its own right yet, and only a trivial quadratic algorithm is known. This paper studies efficient smoothing for structured decomposable circuits. We propose a near-linear time algorithm for this task and explore lower bounds for smoothing decomposable circuits, using existing results on range-sum queries. Further, for the important case of All-Marginals, we show a more efficient linear-time algorithm. We validate experimentally the performance of our methods.", "full_text": "Smoothing Structured Decomposable Circuits\n\nAndy Shih\n\nUniversity of California, Los Angeles\n\nandyshih@cs.ucla.edu\n\nGuy Van den Broeck\n\nUniversity of California, Los Angeles\n\nguyvdb@cs.ucla.edu\n\nPaul Beame\n\nUniversity of Washington\n\nbeame@cs.washington.edu\n\nAntoine Amarilli\n\nLTCI, T\u00e9l\u00e9com Paris, IP Paris\n\nantoine.amarilli@telecom-paris.fr\n\nAbstract\n\nWe study the task of smoothing a circuit, i.e., ensuring that all children of a -gate\nmention the same variables. Circuits serve as the building blocks of state-of-the-art\ninference algorithms on discrete probabilistic graphical models and probabilistic\nprograms. They are also important for discrete density estimation algorithms.\nMany of these tasks require the input circuit to be smooth. However, smoothing\nhas not been studied in its own right yet, and only a trivial quadratic algorithm\nis known. This paper studies ef\ufb01cient smoothing for structured decomposable\ncircuits. We propose a near-linear time algorithm for this task and explore lower\nbounds for smoothing decomposable circuits, using existing results on range-sum\nqueries. Further, for the important case of All-Marginals, we show a more ef\ufb01cient\nlinear-time algorithm. We validate experimentally the performance of our methods.\n\n1\n\nIntroduction\n\nCircuits are directed acyclic graphs that are used for many logical and probabilistic inference tasks.\nTheir structure captures the computation of reasoning algorithms. In the context of machine learning,\nstate-of-the-art algorithms for exact and approximate inference in discrete probabilistic graphical\nmodels [Chavira and Darwiche, 2008; Kisa et al., 2014; Friedman and Van den Broeck, 2018]\nand probabilistic programs [Fierens et al., 2015; Bellodi and Riguzzi, 2013] are built on circuit\ncompilation. In addition, learning tractable circuits is the current method of choice for discrete\ndensity estimation [Gens and Domingos, 2013; Rooshenas and Lowd, 2014; Vergari et al., 2015;\nLiang et al., 2017]. Circuits are also used to enforce logical constraints on deep neural networks [Xu\net al., 2018].\nMost of the probabilistic inference algorithms on circuits actually require the input circuit to be\nsmooth (also referred to as complete) [Sang et al., 2005; Poon and Domingos, 2011]. The notion of\nsmoothness was \ufb01rst introduced by Darwiche [2001] to ensure ef\ufb01cient model counting and cardinality\nminimization and has since been identi\ufb01ed as essential to probabilistic inference algorithms. Yet, to\nthe best of our knowledge, no ef\ufb01cient algorithm to smooth a circuit has been proposed beyond the\noriginal quadratic algorithm by Darwiche [2001].\nThe quadratic complexity can be a major bottleneck, since circuits in practice often have hundreds\nof thousands of edges when learned, and millions of edges when compiled from graphical models.\nAs such, in the latest Dagstuhl Seminar on \u201cRecent Trends in Knowledge Compilation\u201d, this task of\nsmoothing a circuit was identi\ufb01ed as a major research challenge [Darwiche et al., 2017]. Therefore, a\nmore ef\ufb01cient smoothing algorithm will increase the scalability of circuit-based inference algorithms.\nIntuitively, smoothing a circuit amounts to \ufb01lling in the missing variables under its -gates. In\nFigure 1a we see that the -gate does not mention the same variables on its left side and right side,\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f!$\n\n(cid:1)\n\n!\" !#\n\n(cid:2)\n\n(cid:2)\n\n!\" !#\n\n(cid:2)\n\n(cid:1)\n\n!$ -!$ (cid:1)!\" -!\"\n\n(cid:1)\n\n(cid:2)\n\n(cid:2)!$\n(cid:1)!# -!#\n\n(a) A circuit.\n\n(b) A smooth circuit.\n\nFigure 1: Two equivalent circuits computing (x0 \u2326 x1) x2. The left one is not smooth and the right\none is smooth.\n\nso we \ufb01ll in the missing variables by adding tautological gates of the form xi xi, resulting\nin the smooth circuit in Figure 1b. Filling in these missing variables is necessary for probabilistic\ninference tasks such as computing marginals, computing probability of evidence, sampling, and\napproximating Maximum A Posteriori inference [Sang et al., 2005; Chavira and Darwiche, 2008;\nFriesen and Domingos, 2016; Friedman and Van den Broeck, 2018; Mei et al., 2018]. The task of\nsmoothing was also explored by Peharz et al. [2017], where they look into preserving smoothness\nwhen augmenting Sum-Product Networks for computing Most Probable Explanations.\nIn this paper we propose a more ef\ufb01cient smoothing algorithm. We focus on the commonly used\nclass of structured decomposable circuits, which include structured decomposable Negation Normal\nForm, Sentential Decision Diagrams, and more [Pipatsrisawat and Darwiche, 2008; Darwiche, 2011].\nIntuitively, structuredness requires that circuits always consider their variables in a certain way, which\nis formalized as a tree structure on the variables called a vtree.\nOur \ufb01rst contribution (Section 4) is to show a near-linear time algorithm for smoothing such circuits,\nwhich is a clear improvement on the naive quadratic algorithm. Speci\ufb01cally, our algorithm runs in\ntime proportional to the circuit size multiplied by the inverse Ackermann function \u21b5 of the circuit\nsize and number of variables1 (Theorem 4.7).\nOur second contribution (Section 5) is to show a lower bound of the same complexity, on smoothing\ndecomposable circuits for the restricted class of smoothing algorithms that we call smoothing-gate\nalgorithms (Theorem 5.2). Intuitively, smoothing-gate algorithms are those that retain the structure of\nthe original circuit and can only make them smooth by adding new gates to cover the missing variables.\nThis natural class corresponds to the example in Figure 1 and our near-linear time smoothing algorithm\nalso falls in this class. We match its complexity and show a lower bound on the performance of any\nsmoothing-gate algorithm, relying on known results in the \ufb01eld of range-sum queries.\nOur third contribution (Section 6) is to focus on the probabilistic inference task of All-Marginals\nand to propose a novel linear time algorithm for this task which bypasses the need for smoothing,\nassuming that the weight function is always positive and supports all four elementary operations of\n, ,\u2326,\u21b5 (Theorem 6.1). These results are summarized in Table 1.\nOur fourth contribution (Section 7) is to study how to make a circuit smooth while preserving\nstructuredness. We show that we cannot achieve a sub-quadratic smoothing algorithm if we impose\nthe same vtree structure on the output circuit unless the vtree has low height (Prop. 7.1).\nOur \ufb01nal contribution (Section 8) is to experiment on smoothing and probabilistic inference tasks.\nWe evaluate the performance of our smoothing and of our linear time All-Marginals algorithm.\nThe rest of the paper is structured as follows. In Section 2 we review the necessary de\ufb01nitions, and\nin Section 3 we motivate the task of smoothing in more detail. We then present each of our \ufb01ve\ncontributions in order in Sections 4, 5, 6, 7 and 8. We conclude in Section 9.\n\n2 Background\n\nLet us now de\ufb01ne the model of circuits that we study (refer again to Figure 1 for an example):\n\n1The inverse Ackermann function \u21b5 is de\ufb01ned in Tarjan [1972]. As the Ackermann function grows faster\nthan any primitive recursive function, the function \u21b5 grows slower than the inverse of any primitive recursive\nfunction, e.g., slower than any number of iterated logarithms of n.\n\n2\n\n\fTable 1: Summary of results on structured decomposable circuits. We let n be the number of variables\nand m be the size of the circuit.\n\nTask\n\nOperations\n\nSmoothing\nSmoothing\u21e4\nAll-Marginal , ,\u2326,\u21b5\n\u21e4 For smoothing-gate algorithms on decomposable circuits.\n\n,\u2326\n,\u2326\n\nComplexity\nO(m \u00b7 \u21b5(m, n))\n\u2326(m \u00b7 \u21b5(m, n))\u21e4\n\n\u21e5(m)\n\nDe\ufb01nition 2.1. A logical circuit is a rooted directed acyclic graph where leaves are literals, and\ninternal gates perform disjunction (-gates) or conjunction (\u2326-gates). An arithmetic circuit is one\nwhere leaves are numeric constants or variables, and internal gates perform addition (-gates) or\nmultiplication (\u2326-gates). The children of an internal gate are the gates that feed into it.\nWe focus on circuits that are decomposable and more precisely that are structured. We \ufb01rst de\ufb01ne\ndecomposability:\nDe\ufb01nition 2.2. For any gate p, we call varsp the set of variables that appear at or below gate p.\nA circuit is decomposable if these sets of variables are disjoint between the two children of every\n\u2326-gate. Formally, for every \u2326-gate p with children c1 and c2, we have varsc1 \\ varsc2 = ;.\nWe then de\ufb01ne structuredness, by introducing the notion of a vtree on a set of variables:\nDe\ufb01nition 2.3. A vtree on a set of variables S is a full binary tree whose leaves have a one-to-one\ncorrespondence with the variables in S. We denote the set of variables under a vtree node p as up.\nDe\ufb01nition 2.4. A circuit respects a vtree V if each of its \u2326-gate has 0 or 2 inputs, and there is a\nmapping \u21e2 from its gates to V such that:\n\n\u2022 For every variable c, the node \u21e2(c) is mapped to the leaf of V corresponding to c.\n\u2022 For every -gate c and child c0 of c, the node \u21e2(c0) is \u21e2(c) or a descendant of \u21e2(c) in V .\n\u2022 For every \u2326-gate c with children c1, c2, letting vl and vr be the left and right children of\n\n\u21e2(c), the node \u21e2(c1) is vl or a descendant of vl and \u21e2(c2) is vr or a descendant of vr.\n\nA circuit is structured decomposable if it respects some vtree V . The circuit is then decomposable.\nRecall that a circuit can be preprocessed in linear time to ensure that each \u2326-gate has 0 or 2 inputs.\nStructured decomposability was introduced in the context of logical circuits, and it is also enforced in\nSentential Decision Diagrams, a widely used tractable representation of Boolean functions [Darwiche,\n2011]. This property allows for a polytime conjoin operation and symmetric/group queries on\nlogical circuits [Pipatsrisawat and Darwiche, 2008; Bekker et al., 2015]. For circuits that represent\ndistributions, structured decomposability allows multiplication of these distributions [Shen et al.,\n2016], ef\ufb01cient computation of the KL-divergence between two distributions [Liang and Van den\nBroeck, 2017], and more. Structured decomposable circuits are also used when one wants to\ninduce distributions over arbitrary logical formulae [Kisa et al., 2014] or compile a logical formula\nbottom-up [Oztok and Darwiche, 2015].\nNext, we review another property of logical circuits that is relevant for probabilistic inference\ntasks [Darwiche, 2001; Choi and Darwiche, 2017].\nDe\ufb01nition 2.5. A logical circuit on variables X is deterministic if under any input x, at most one\nchild of each -gate evaluates to true.\nIn the rest of this paper, we will let n denote the number of variables in a circuit and let m n\ndenote the size of a circuit, measured by the number of edges in the circuit.\n\n3 Smoothing\n\nWe focus on the probabilistic inference tasks of weighted model counting and computing All-\nMarginals [Sang et al., 2005; Chavira and Darwiche, 2008]. We will study weighted model counting\n\n3\n\n\fs =Lx2fNx2x w(x) AMC (1)\n\nn @s\n\n@w(x) ,\n\nin the more general form of Algebraic Model Counting (AMC) [Kimmig et al., 2016]. To describe\nthese tasks, we de\ufb01ne instantiations, knowledge bases and models.\nDe\ufb01nition 3.1. Given a set of variables X, a full assignment of all the variables in X is called an\ninstantiation. A set f of instantiations is called a knowledge base, and each instantiation in f is\ncalled a model.\nThe AMC task on a knowledge base f and a weight function w (a mapping from the literals to the\nreals) is to compute s from Equation 1. The task of All-Marginals is to compute the partial derivative\nof s with respect to the weight of each literal as in Equation 2.\n\n@s\n\n@w(x) X 2 Xo All-Marginals (2)\n\nOn probabilistic models, s is often the partition function or the probability of evidence, where the\npartial derivatives of these quantities correspond to all (conditional) marginals in the distribution.\nComputing All-Marginals ef\ufb01ciently signi\ufb01cantly speeds up probabilistic inference, and is used as a\nsubroutine in the collapsed sampling algorithm in our later experiments.\nThese tasks are dif\ufb01cult in general, unless we have a tractable representation of the knowledge base f.\nMoreover, it is important to have a smooth representation. Indeed, suppose f is represented as a\nlogical circuit that is only deterministic and decomposable but not smooth. Then, there is in general\nno known technique to perform the AMC and All-Marginals tasks in linear time (although there\nis a special case where AMC can be performed in linear time, explained below). By contrast, if f\nis represented as a logical circuit that is deterministic, decomposable and smooth, then the AMC\nand All-Marginals tasks can be performed in time O(m). For example, the AMC task is done by\nconverting the deterministic, decomposable and smooth logical circuit into an arithmetic circuit,\nattaching the weights of the variables as numeric constants in the circuit, and then evaluating the\ncircuit. Furthermore, when a decomposable arithmetic circuit computes a factor (a mapping from\ninstantiations to the reals), enforcing smoothness allows it to compute factor marginals in linear\ntime [Choi and Darwiche, 2017].\nAs smoothing is necessary to ef\ufb01ciently solve these inference tasks, we are interested in studying the\ncomplexity of smoothing a circuit. To do so, we formally de\ufb01ne the task of smoothing.\nDe\ufb01nition 3.2. Two logical circuits on variables X are equivalent if they evaluate to the same output\non any input x.\nDe\ufb01nition 3.3. A circuit is smooth if for every pair of children c1 and c2 of a -gate, varsc1 =\nvarsc2.\nDe\ufb01nition 3.4. The task of smoothing a decomposable logical circuit is to output a smooth and\ndecomposable logical circuit that is equivalent to the input circuit. Similarly, the task of smoothing a\ndeterministic and decomposable logical circuit is to output a smooth, deterministic, and decomposable\ncircuit that is equivalent to the input circuit.\nWe only de\ufb01ne the smoothing task over logical circuits. This is because the probabilistic inference\ntasks are performed by smoothing a logical circuit and then converting it into an arithmetic circuit,\nso it is easier for the reader to only consider smoothing on logical circuits. For the rest of the paper,\nwe will refer to logical circuits simply as circuits. Note that we require the output smooth circuit to\npreserve the same properties (decomposability/determinism) as the input circuit. Indeed, there is a\ntrivial linear time algorithm for smoothing that breaks decomposability (i.e., simply conjoin all gates\nwith a tautological gate that mentions all variables), but then the resulting circuit may not be useful\nfor probabilistic inference. Again, we need decomposability to compute factor marginals, and we\nneed decomposability along with determinism to compute AMC and All-Marginals. By contrast, we\ndo not require the output smooth circuit to be structured, because structuredness is not required to\nsolve our tasks of AMC or All-Marginals (nevertheless, we do study structuredness in Section 7).\nSometimes, when the weight function allows division, there exists a renormalization technique that\ncan solve AMC in linear time without smoothing the initial circuit [Kimmig et al., 2016]. However,\nthis restriction is limiting, since even if the weight function is de\ufb01ned over a \ufb01eld, division by zero\nmay be unavoidable [Van den Broeck et al., 2014]. Also, the weight function may only be de\ufb01ned\nover a semiring (,\u2326) [Friesen and Domingos, 2016]. In these cases, there is no known technique to\nbypass smoothing. Therefore, developing an ef\ufb01cient smoothing algorithm is an important problem,\nwhich we address next in Sections 4 & 5.\n\n4\n\n\fOn the other hand, one may still be interested in settings where all four elementary operations of\n, ,\u2326,\u21b5 on the weight function are allowed. To this end, we also propose in Section 6 a novel\ntechnique that solves the All-Marginals task in linear time when the weight function is positive, and\nwhen subtraction and division are allowed.\n\n4 Smoothing Algorithm\n\nWe present our algorithm for smoothing structured decomposable circuits, based on the semigroup\nrange-sum literature. First, we de\ufb01ne a class of common strategies to smooth a circuit, which\nencompasses both the previously-known algorithm and our new algorithm.\nThe existing quadratic algorithm for smoothing a circuit goes to each -gate and inserts missing\nvariables one by one [Darwiche, 2001]. This algorithm retains the original gates of the circuit, and\nadds additional gates to \ufb01ll in missing variables. We will de\ufb01ne smoothing-gate algorithms as the\nfamily of smoothing algorithms that retain the original gates of the circuit.\nDe\ufb01nition 4.1. Edge contraction is the process of removing each -gate or \u2326-gate with a single\nchild, and feeding the child as input to each parent of the removed gate.\nDe\ufb01nition 4.2. A subcircuit of a circuit is another circuit formed by taking a subset of the gates\nand edges of the circuit, and picking a new root. The gate subset must include the new root and all\nendpoints of the edge subset.\nDe\ufb01nition 4.3. Two circuits g and h with gate sets G and H are isomorphic if there exists a bijection\nB : G ! H between their gates such that the following conditions hold.\n\n1. For any gate p 2 G, B(p) is the same type of gate as p.\n2. For any gate p1 2 G and child p2 2 G of p1, the gate B(p2) is a child of B(p1) in h.\n3. For any gate p01 2 H and child p02 2 H of p01, the gate B1(p02) is a child of B1(p01) in g.\n4. The root of g maps to the root of h.\n\nAn algorithm is a smoothing-gate algorithm if for any edge-contracted (deterministic and) decompos-\nable input circuit g, the output circuit is smooth and (deterministic and) decomposable, is equivalent\nto g, and has a subcircuit that is isomorphic to g after edge contraction.\nSmoothing-gate algorithms are intuitive, since the structure of the original circuit is preserved. This\nincludes the quadratic algorithm, as well as algorithms which identify missing variables under each\ngate and attach tautological gates to \ufb01ll in those missing variables, as was done in Figure 1. Formally:\nDe\ufb01nition 4.4. A gate g is called a smoothing gate for a set of variables X if varsg = X and the\ncircuit rooted at g is tautological and decomposable. We denote such a gate by SG(X).\nThe structure of a smoothing gate SG(X) is not speci\ufb01ed. The only requirement is that it mentions all\nvariables in X and is tautological and decomposable. For example, the quadratic algorithm constructs\neach SG(X) by naively conjoining x x for each variable in X one at a time, leading to a linear\namount of work per gate. In the case of structured decomposable circuits, we can do much better.\nLemma 4.5. Consider a structured decomposable circuit, and let \u21e1 be the sequence of its variables\nwritten following the in-order traversal of its vtree. For any two vtree nodes (\u21e2(p),\u21e2 (c)), we have\nthat u\u21e2(p)\\u\u21e2(c) can be written as the union of at most two intervals in \u21e1.\nProof. Since v is a binary tree, the in-order traversal of v visits the variables of u\u21e2(p) consecutively,\nand the variables of u\u21e2(c) consecutively. Hence, u\u21e2(p) and u\u21e2(c) can each be written as one interval,\nand u\u21e2(p)\\u\u21e2(c) can be written as the union of at most two intervals.\nWe then smooth a circuit in one bottom-up pass. If p is a leaf \u2326-gate, replace it with SG(u\u21e2(p)). If p\nis an internal \u2326-gate, letting vl, vr and c1, c2 be the children of \u21e2(p) and p respectively, replace c1\nwith c1 \u2326 SG(uvl\\u\u21e2(c1)) and c2 with c2 \u2326 SG(uvr\\u\u21e2(c2)). If p is a -gate, replace each child c\nwith c \u2326 SG(u\u21e2(p)\\u\u21e2(c)). By Lemma 4.5, each smoothing gate can be built by multiplying together\ntwo gates of the formNX(x x), where X forms an interval in \u21e1. Thus, we can appeal to results\nfrom semigroup range-sums, by treating each x x as an element in a semigroup, and treating the\ncomputation ofNX(x x) as a \u201csummation\u201d in the semigroup over an interval (range).\n\n5\n\n\f%\n\n(cid:2)\n\n&\n(cid:1)!$ -!$\n\n(cid:2)\n\n'\n(cid:1)!( -!(\n\n(cid:2)\n\n(cid:1)!# -!#\n\n(cid:1)!) -!)\n\na = w(x_2) + w(x_3)\nb = a + w(x_1)\nc = a + w(x_4)\noutput b, c\n\n(a) Sequence of additions to compute intervals\n\n(b) Tracing the additions into a circuit\n\nFigure 2: We construct smoothing gates for {x1, x2, x3} and {x2, x3, x4} by \ufb01rst passing the intervals\n[1, 3] and [2, 4] to the range-sum algorithm, and then tracing the sequence of additions. The trace is\ndone by replacing w(xi) with xi xi and replacing each addition with a \u2326-gate.\n\nSemigroup Range-Sum. The semigroup range-sum problem considers a sequence of n variables\nx1, . . . , xn, a sequence of m n intervals [a1, b1], . . . , [am, bm] of these variables, and a weight\nfunction w from the variables to a semigroup. The task is to compute the sum of weights of the\nvariables in each interval, i.e. sj =\u2303 i2[aj ,bj ]w(xi) for all j 2 [1, m] [Yao, 1982; Chazelle and\nRosenberg, 1989]. Since w is only de\ufb01ned over a semigroup, subtraction is not supported. That\nis, we cannot follow the ef\ufb01cient strategy of precomputing all pk =\u2303 i2[1,k]w(xi) and outputting\nsj = pbj paj1. Still, there is an ef\ufb01cient algorithm to compute all the required sums in time\nO(m \u00b7 \u21b5(m, n)), where \u21b5 is the inverse Ackermann function. We restate their result here.\nTheorem 4.6. Given n variables de\ufb01ned over a semigroup and m intervals, the sum of all intervals\ncan be computed using O(m \u00b7 \u21b5(m, n)) additions [Chazelle and Rosenberg, 1989].\nOur smoothing task can be reduced to the semigroup range-sum problem as follows. Smoothing a\nstructured decomposable circuit of size m reduces to constructing smoothing gates for O(m) intervals.\nWe pass these intervals as input to the range-sums algorithm, which will then generate a sequence\nof additions that computes the sum of each interval: each addition adds two numbers that are either\nindividual variable weights or a sum that was previously computed.\nWe then trace this sequence of additions (see Figure 2). For the base case of w(xi), let g(w(xi))\nbe the gate xi xi. Then for each addition s = t + u, we construct a corresponding \u2326-gate\ng(s) = g(t) \u2326 g(u). In particular, when an addition in the sequence has computed the sum of an\ninterval, then the corresponding gate is a smoothing gate for that interval. This process preserves\ndeterminism, so it converts a (deterministic and) structured decomposable circuit into a smooth and\n(deterministic and) decomposable circuit. The output circuit is generally no longer structured.\nTheorem 4.7. The task of smoothing a (deterministic and) structured decomposable circuit has time\ncomplexity O(m \u00b7 \u21b5(m, n)), where n is the number of variables and m is the size of the circuit.\nAlthough Chazelle and Rosenberg [1989] do not formally assert a time complexity on determining the\nsequence of additions to perform, we show that there is no overhead to this step. That is, Chazelle and\nRosenberg [1989] show that there exists a sequence of O(m\u00b7 \u21b5(m, n)) additions, and we additionally\nprove that this sequence can be computed in time O(m \u00b7 \u21b5(m, n)). The proof is in the Appendix.\n5 Lower Bound\n\nIn this section we show a lower bound on the task of smoothing a decomposable circuit, for the family\nof smoothing-gate algorithms. First we state an existing lower bound on semigroup range-sums:\nTheorem 5.1. Given n variables de\ufb01ned over a semigroup, for all algorithms there exists a set of\nm = n intervals such that computing the sum of the weights of the variables for each interval takes\n\u2326(m \u00b7 \u21b5(m, n)) number of additions [Chazelle and Rosenberg, 1989].\nWe cannot immediately assert the same lower bound for the problem of smoothing decomposable\ncircuits, for two reasons. First, we must reduce the computation of the m interval sums to a smoothing\nproblem, and express this reduction in a circuit taking no more than O(m) space. Second, we must\nshow that no smoothing algorithm is more ef\ufb01cient than smoothing-gate algorithms. We address the\n\ufb01rst issue but leave the second open, leading to the following theorem with the proof in the Appendix.\nTheorem 5.2. For smoothing-gate algorithms, the task of smoothing a decomposable circuit has\nspace complexity \u2326(m \u00b7 \u21b5(m, n)), where n is the number of variables and m is the size of the circuit.\n\n6\n\n\f6 Computing All-Marginals\n\nIn this section, we focus on the speci\ufb01c task of computing All-Marginals on a knowledge base\nrepresented as a deterministic and structured decomposable circuit. Remember that the goal is to\ncompute the partial derivative of the circuit with respect to the weight of each literal (Equation 2\nin Section 3). If the input circuit is smooth, then we can solve the task in time linear in the size of\nthe circuit. Therefore, with the techniques in Section 4, given an input deterministic and structured\ndecomposable circuit, we can smooth it and then convert it into an arithmetic circuit to compute\nAll-Marginals, all in time O(m \u00b7 \u21b5(m, n)). In this section, we show a more ef\ufb01cient algorithm that\nbypasses smoothing altogether, when we assume that the weight function also supports division and\nsubtraction and is always positive (so that we never divide by zero). The method that we propose\ntakes time O(m), which is optimal and saves us the effort of modifying the input circuit.\n\nAlgorithm 1 all-marginals(g, w)\nWe compute partial derivatives of positive literals. The negative literals are handled similarly.\ninput: A deterministic and structured decomposable circuit g on n variables and a weight function w\nthat is always positive and supports , ,\u2326,\u21b5.\noutput: Partial derivatives dj for 1 \uf8ff j \uf8ff n.\nmain(g, w):\n1: s bottom-up(g, w) // requires ,\u2326,\u21b5\n2: return top-down(g, w, s)\ntop-down(g, w, s):\n1: D { root of g : s[root of g]}\n// cache\n2: for gates p in g, parents before children do\nif p is a leaf then dp D[p]\n3:\nif p is a \u2326-gate with children C then\n4:\nm (Nk2C s[k]) \u2326 D[p]\n5:\n6:\n7:\n\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16: d1 d1 1\n17: for i [2, n] do dj dj1 j\n18: return d\n\nl1, r1, l2, r2 getinterval(p, k)\nl1 l1 D[p]\nr1+1 r1+1 D[p]\nl2 l2 D[p]\nr2+1 r2+1 D[p]\nD[k] D[k] D[p]\n\nif p is a -gate with children C then\n\nfor each child k in C do\n\nfor each child k in C do\n\nD[k] D[k] (m \u21b5 s[k])\n\nThe algorithm is a form of backpropagation, and goes as follows (Algorithm 1). First, we compute\nthe circuit output using a linear bottom-up pass over the circuit in the bottom-up subroutine, the\ndetails of which are omitted. During this process, we keep track of the contribution of each internal\ngate using the dictionary s. Next, we traverse the circuit top-down in order to compute the partial\nderivative of each gate. At a \u2326-gate or -gate, we propagate the partial derivative down to the\nchildren as needed. However, since the circuit is not smooth, there may be missing variables in the\nchildren of -gates, in which case the propagation is incomplete. The challenge is then to ef\ufb01ciently\ncomplete the propagation to the missing variables. We optimize this propagation step using range\nincrements, which gives us the next theorem with a proof in the Appendix.\nTheorem 6.1. The All-Marginals task on a deterministic and structured decomposable circuit g\nand a weight function w that is always positive and supports , ,\u2326,\u21b5 has time complexity \u21e5(m),\nwhere n is the number of variables and m is the size of g.\n\n7 On Retaining Structuredness\n\nRecall that our smoothing algorithm in Theorem 4.7 does not preserve structuredness of the input\ncircuit, because it constructs smoothing gates in a way that is ef\ufb01cient but not structured. While\nstructuredness is not required to solve problems such as AMC or all-marginals, it is still useful\nbecause it allows for a polytime conjoin operation, multiplication of distributions, and more (see\nSection 2). In this section, we show that we cannot match the performance of Theorem 4.7 while\nretaining structuredness, because any smoothing algorithm that maintains the same vtree structure\nmust run in quadratic time. We leave open the question of whether there would be an ef\ufb01cient\nsmoothing algorithm producing a circuit structured with a different vtree.\n\n7\n\n\fProposition 7.1. The task of smoothing a (deterministic and) structured decomposable circuit g\nwhile enforcing the same vtree has space complexity \u21e5(hm), where h is the height of the vtree and\nm is the size of g.\nProof. Upper bound: We construct smoothing gates following the structure of the vtree: for each\nvtree node p with children pl and pr, we build in constant time a structured smoothing gate for the\nvariables that are descendants of p, using the smoothing gate for the variables that are descendants of\npl and the one for the variables that are descendants of pr. Now, we can use these gates to smooth the\ncircuit: any interval of variables in the in-order traversal of the vtree can be written as h intervals\ncorresponding to vtree nodes, so smoothing g has time complexity O(hm). As with Theorem 4.6,\nthe process of attaching smoothing gates preserves determinism.\nLower bound: Consider a right-linear vtree v with height h = n and variables X1, . . . , Xn, in that\norder. For simplicity, let n be a multiple of 3, and consider the following functions for y 2 [0, 2n/3):\n\ni=1 (i, y)xi\n\ni=2n/3+1 (i, y)xi\n\ni=1 xi) (L2n/31\n\ny=1\n\nKy =Nn\n\nJy =Nn/3\nwhere (i, y) = 1 if the i-th bit of the binary representation of y is set, and 1 otherwise.\nNext, consider f = (Nn\n\n(Jy \u2326 Ky)). An instantiation satis\ufb01es f if all its\nliterals are negative, or if the sign of its literals from X1, . . . , Xn/3 (in order) equals those from\nX2n/3+1, . . . , Xn, and are not all negative. The all-negative case is included so that f mentions all n\nvariables, as otherwise f would already be smooth. We can build a circuit g with size O(2n/3) that\nrespects v and computes f using an Ordered Binary Decision Diagram representation [Bryant, 1986].\nYet, any smooth circuit G that respects v and computes f has size \u2326(n \u00b7 2n/3), as we see next.\nFirst, we use a standard notion on circuits which we refer to as a certi\ufb01cate, following the terminology\nby Bova et al. [2014]. A certi\ufb01cate is formed by keeping exactly one child of each -gate, and\nkeeping all children of each \u2326-gate. Since G is smooth and decomposable, every certi\ufb01cate of G\nmust have exactly n literals, and corresponds to an instantiation of the n variables. Let Cy be a\ncerti\ufb01cate of G whose corresponding instantiation satis\ufb01es Jy \u2326 Ky, and let Cz be a certi\ufb01cate of h\nwhose corresponding instantiation satis\ufb01es Jz \u2326 Kz, with y 6= z and y, z 2 [1, 2n/3).\nNext, let pi denote the parent of the vtree node corresponding to variable Xi. We will show that Cy\nand Cz cannot share a gate w which maps to vtree node pk for k 2 [n/3 + 1, 2n/3]. Suppose that\nsuch a gate w exists. Then we can form a new certi\ufb01cate by swapping out the subtree of certi\ufb01cate Cy\nrooted at w with the subtree of certi\ufb01cate Cz rooted at w. This new certi\ufb01cate now satis\ufb01es Jy \u2326 Kz\nand is a valid certi\ufb01cate of the circuit G, which contradicts the fact that G computes f.\nTo \ufb01nish, we consider 2n/3 1 different certi\ufb01cates satisfying Jy \u2326 Ky for y 2 [1, 2n/3). None of\nthese certi\ufb01cates can share any gates that map to vtree nodes pk for k 2 [n/3 + 1, 2n/3]. It follows\nthat the output circuit G has size \u2326(n \u00b7 2n/3). Since the input circuit g is deterministic (because it\nis an OBDD) and the output circuit G need not be deterministic, the lower bound applies to both\nsmoothing tasks (with and without determinism).\n\n8 Experiments\n\nWe experiment on our smoothing algorithm in Section 4 and our All-Marginals algorithm in Sec-\ntion 6.2 Experiments were run on a single Intel(R) Core(TM) i7-3770 CPU with 16GB of RAM.\n\nSmoothing Circuits. We \ufb01rst study the smoothing task on structured decomposable circuits using\nour new smoothing algorithm (Section 4), which we compare to the naive quadratic smoothing\nalgorithm. We construct hand-crafted circuits for which many smoothing gates are required, each of\nwhich covers a large interval. In particular, we pick m large intervals I1, ..., Im and for each interval\nwe construct the structured gate Gi =Nj /2Ii\nxj for a balanced vtree. Then we take each Gi and\nfeed them into one top-level -gate. This triggers the worst-case quadratic behavior of the naive\nsmoothing algorithm, while our new algorithm has near-linear behavior.\n\n2The code for our experiments can be found at https://github.com/AndyShih12/SSDC. There are some\n\ndifferences in our implementation, which we explain in the repository.\n\n8\n\n\fTable 2: Experiments on smoothing hand-crafted circuits and experiments on computing All-\nMarginals as part of the collapsed sampling algorithm. Sizes are reported in thousands (k).\n\nNaive\n\n(a) Time (in seconds) taken to smooth circuits.\nSpeedup \u21e5\n21 \u00b1 1\n161 \u00b1 6\n390 \u00b1 30\n1470 \u00b1 40\n\nSize\n40k 0.82 \u00b1 0.01 0.04 \u00b1 0.01\n0.31 \u00b1 0.01\n416k\n50 \u00b1 0.3\n0.74 \u00b1 0.04\n1,620k\n293 \u00b1 2\n4.13 \u00b1 0.09\n8,500k 6050 \u00b1 20\n\nOurs\n\n(c) Number of , ,\u2326,\u21b5 operations to compute All-\nMarginals when sampling the DBN-11 network.\n\nOurs\n\nNaive\n\nSize\n100k 172,610 \u00b1 1,821 26,807 \u00b1 644\n200k 344,748 \u00b1 3,881 51,864 \u00b1 851\n400k 626,235 \u00b1 9,985 99,567 \u00b1 697\n\nImpr %\n84 \u00b1 1\n85 \u00b1 1\n84 \u00b1 1\n\n(b) Number of , ,\u2326,\u21b5 operations to compute All-\nMarginals when sampling the Segmentation-11 network.\n\nOurs\n\nNaive\n\nSize\n100k 28,494 \u00b1 598 20,207 \u00b1 411\n200k 55,875 \u00b1 1,198 36,101 \u00b1 1,522\n400k 86,886 \u00b1 6,330 56,094 \u00b1 817\n(d) Number of , ,\u2326,\u21b5 operations to compute All-\nMarginals when sampling the CSP-13 network.\n\nImpr %\n29 \u00b1 3\n35 \u00b1 5\n35 \u00b1 6\n\nOurs\n\nNaive\n\nSize\n100k\n36,531 \u00b1 1,484 20,814 \u00b1 619\n200k\n90,352 \u00b1 3,593 38,670 \u00b1 1,438\n400k 122,208 \u00b1 9,971 55,269 \u00b1 1,819\n\nImpr %\n43 \u00b1 4\n57 \u00b1 3\n54 \u00b1 6\n\nThe speedup of our smoothing algorithm is captured in Table 2a. The Size column reports the size of\nthe circuit. The Naive column reports the time taken by the quadratic smoothing algorithm, the Ours\ncolumn reports the same value using our near-linear algorithm, and the Speedup column reports the\nrelative decrease in time. The values are averaged over 4 runs.\n\nCollapsed Sampling. We next benchmark our method for computing All-Marginals in Section 6\non the task of collapsed sampling, which is a technique for probabilistic inference on factor graphs.\nThe collapsed sampling algorithm performs approximate inference on factor graphs by alternating\nbetween knowledge compilation phases and sampling phases [Friedman and Van den Broeck, 2018].\nIn the sampling phase, the algorithm computes All-Marginals as a subroutine.\nWe replace the original quadratic All-Marginals subroutine by our linear time algorithm (Algorithm 1).\nThe requirements for Algorithm 1 are satis\ufb01ed since the weight function w is de\ufb01ned over the reals\nand is always positive in the experiments by Friedman and Van den Broeck [2018]. In Table 2b we\nreport the results on the Segmentation-11 network, which is a network from the 2006-2014 UAI\nProbabilistic Inference competitions. This particular network is a factor graph that was used to do\nimage segmentation/classi\ufb01cation (\ufb01gure out what type of object each pixel corresponds to) [Forouzan,\n2015]. Experiments were also performed on other networks from the inference competition, such\nas DBN-11 and CSP-13 (Table 2c & 2d). For all three networks we see a decrease in the number\nof , ,\u2326,\u21b5 operations needed for each All-Marginal computation. The Size column reports the\nsize threshold during the knowledge compilation phase. The Naive column reports the number of\n, ,\u2326,\u21b5 operations using the original All-Marginals subroutine, the Ours column reports the same\nvalue using Algorithm 1, and the Impr column reports the relative decrease in operations. The values\nare averaged over 4 runs.\n\n9 Conclusion\n\nIn this paper we considered the task of smoothing a circuit. Circuits are widely used for inference\nalgorithms for discrete probabilistic graphical models, and for discrete density estimation. The input\ncircuits are required to be smooth for many of these probabilistic inference tasks, such as Algebraic\nModel Counting and All-Marginals. We provided a near-linear time smoothing algorithm for\nstructured decomposable circuits and proved a matching lower bound within the class of smoothing-\ngate algorithms for decomposable circuits. We introduced a technique to compute All-Marginals in\nlinear time without smoothing the circuit, when the weight function supports division and subtraction\nand is always positive. We additionally showed that smoothing a circuit while maintaining the same\nvtree structure cannot be sub-quadratic, unless the vtree has low height. Finally, we empirically\nevaluated our algorithms and showed a speedup over both the existing smoothing algorithm and the\nexisting All-Marginals algorithm.\n\n9\n\n\fAcknowledgments This work is partially supported by NSF grants #IIS-1657613, #IIS-1633857,\n#CCF-1837129, DARPA XAI grant #N66001-17-2-4032, NEC Research, and gifts from Intel and\nFacebook Research. We thank Louis Jachiet for the helpful discussion of Theorem 4.7.\n\nReferences\nJessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, and Guy Van den Broeck. Tractable\n\nlearning for complex probability queries. In NIPS, 2015.\n\nElena Bellodi and Fabrizio Riguzzi. Expectation maximization over binary decision diagrams for\n\nprobabilistic logic programs. Intell. Data Anal., 17:343\u2013363, 2013.\n\nSimone Bova, Florent Capelli, Stefan Mengel, and Friedrich Slivovsky. Expander CNFs have\n\nexponential DNNF size. CoRR, abs/1411.1995, 2014.\n\nRandal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on\n\nComputers, C-35:677\u2013691, 1986.\n\nMark Chavira and Adnan Darwiche. On probabilistic inference by weighted model counting. Artif.\n\nIntell., 172:772\u2013799, 2008.\n\nBernard Chazelle and Burton Rosenberg. Computing partial sums in multidimensional arrays. In\n\nSymposium on Computational Geometry, 1989.\n\nArthur Choi and Adnan Darwiche. On relaxing determinism in arithmetic circuits. In ICML, 2017.\nAdnan Darwiche, Pierre Marquis, Dan Suciu, and Stefan Szeider. Recent trends in knowledge\n\ncompilation (Dagstuhl Seminar 17381). Dagstuhl Reports, 7:62\u201385, 2017.\n\nAdnan Darwiche. On the tractable counting of theory models and its application to truth maintenance\n\nand belief revision. Journal of Applied Non-Classical Logics, 11:11\u201334, 2001.\n\nAdnan Darwiche. SDD: A new canonical representation of propositional knowledge bases. In IJCAI,\n\n2011.\n\nDaan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Sht. Shterionov, Bernd Gutmann, Ingo\nThon, Gerda Janssens, and Luc De Raedt. Inference and learning in probabilistic logic programs\nusing weighted boolean formulas. TPLP, 15:358\u2013401, 2015.\n\nSholeh Forouzan. Approximate inference in graphical models. UC Irvine, 2015.\nTal Friedman and Guy Van den Broeck. Approximate knowledge compilation by online collapsed\n\nimportance sampling. In NeurIPS, December 2018.\n\nAbram L. Friesen and Pedro M. Domingos. The sum-product theorem: A foundation for learning\n\ntractable models. In ICML, 2016.\nAkshat Garg. Difference array.\n\nrange-update-query-o1/. Online; accessed 2019-08-30.\n\nhttps://www.geeksforgeeks.org/difference-array-\n\nRobert Gens and Pedro M. Domingos. Learning the structure of sum-product networks. In ICML,\n\n2013.\n\nAngelika Kimmig, Guy Van den Broeck, and Luc De Raedt. Algebraic model counting. International\n\nJournal of Applied Logic, November 2016.\n\nDoga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision\n\ndiagrams. In KR, 2014.\n\nYitao Liang and Guy Van den Broeck. Towards compact interpretable models: Shrinking of learned\nprobabilistic sentential decision diagrams. In IJCAI 2017 Workshop on Explainable Arti\ufb01cial\nIntelligence (XAI), August 2017.\n\nYitao Liang, Jessa Bekker, and Guy Van den Broeck. Learning the structure of probabilistic sentential\n\ndecision diagrams. In UAI, 2017.\n\n10\n\n\fJun Mei, Yong Jiang, and Kewei Tu. Maximum a posteriori inference in sum-product networks. In\n\nAAAI, 2018.\n\nUmut Oztok and Adnan Darwiche. A top-down compiler for sentential decision diagrams. In IJCAI,\n\n2015.\n\nRobert Peharz, Robert Gens, Franz Pernkopf, and Pedro M. Domingos. On the latent variable\ninterpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine\nIntelligence, 39:2030\u20132044, 2017.\n\nKnot Pipatsrisawat and Adnan Darwiche. New compilation languages based on structured decompos-\n\nability. In AAAI, 2008.\n\nHoifung Poon and Pedro M. Domingos. Sum-product networks: A new deep architecture. 2011\n\nIEEE International Conference on Computer Vision Workshops, pages 689\u2013690, 2011.\n\nAmirmohammad Rooshenas and Daniel Lowd. Learning sum-product networks with direct and\n\nindirect variable interactions. In ICML, 2014.\n\nTian Sang, Paul Beame, and Henry A. Kautz. Performing bayesian inference by weighted model\n\ncounting. In AAAI, 2005.\n\nYujia Shen, Arthur Choi, and Adnan Darwiche. Tractable operations for arithmetic circuits of\n\nprobabilistic models. In NIPS, 2016.\n\nRobert E. Tarjan. Ef\ufb01ciency of a good but not linear set union algorithm. J. ACM, 22:215\u2013225, 1972.\nGuy Van den Broeck, Wannes Meert, and Adnan Darwiche. Skolemization for weighted \ufb01rst-order\n\nmodel counting. In KR, 2014.\n\nAntonio Vergari, Nicola Di Mauro, and Floriana Esposito. Simplifying, regularizing and strengthening\n\nsum-product network structure learning. In ECML/PKDD, 2015.\n\nJingyi Xu, Zilu Zhang, Tal Friedman, Yitao Liang, and Guy Van den Broeck. A semantic loss function\n\nfor deep learning with symbolic knowledge. In ICML, 2018.\n\nAndrew Chi-Chih Yao. Space-time tradeoff for answering range queries (extended abstract). In\n\nSTOC, 1982.\n\n11\n\n\f", "award": [], "sourceid": 6080, "authors": [{"given_name": "Andy", "family_name": "Shih", "institution": "UCLA / Stanford"}, {"given_name": "Guy", "family_name": "Van den Broeck", "institution": "UCLA"}, {"given_name": "Paul", "family_name": "Beame", "institution": "University of Washington"}, {"given_name": "Antoine", "family_name": "Amarilli", "institution": "LTCI, T\u00e9l\u00e9com ParisTech"}]}