<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="AsciiDoc 8.6.8">
<title>CommonArg</title>
<link rel="stylesheet" href="./asciidoc.css" type="text/css">
<link rel="stylesheet" href="./pygments.css" type="text/css">


<script type="text/javascript" src="./asciidoc.js"></script>
<script type="text/javascript">
/*<![CDATA[*/
asciidoc.install();
/*]]>*/
</script>
<link rel="stylesheet" href="./mlton.css" type="text/css"/>
</head>
<body class="article">
<div id="banner">
<div id="banner-home">
<a href="./Home">MLton 20130715</a>
</div>
</div>
<div id="header">
<h1>CommonArg</h1>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p><a href="CommonArg">CommonArg</a> is an optimization pass for the <a href="SSA">SSA</a>
<a href="IntermediateLanguage">IntermediateLanguage</a>, invoked from <a href="SSASimplify">SSASimplify</a>.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_description">Description</h2>
<div class="sectionbody">
<div class="paragraph"><p>It optimizes instances of <span class="monospaced">Goto</span> transfers that pass the same
arguments to the same label; e.g.</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>L_1 ()
  ...
  z1 = ?
  ...
  L_3 (x, y, z1)
L_2 ()
  ...
  z2 = ?
  ...
  L_3 (x, y, z2)
L_3 (a, b, c)
  ...</pre>
</div></div>
<div class="paragraph"><p>This code can be simplified to:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>L_1 ()
  ...
  z1 = ?
  ...
  L_3 (z1)
L_2 ()
  ...
  z2 = ?
  ...
  L_3 (z2)
L_3 (c)
  a = x
  b = y</pre>
</div></div>
<div class="paragraph"><p>which saves a number of resources: time of setting up the arguments
for the jump to <span class="monospaced">L_3</span>, space (either stack or pseudo-registers) for
the arguments of <span class="monospaced">L_3</span>, etc.  It may also expose some other
optimizations, if more information is known about <span class="monospaced">x</span> or <span class="monospaced">y</span>.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_implementation">Implementation</h2>
<div class="sectionbody">
<div class="ulist"><ul>
<li>
<p>
<a href="https://github.com/MLton/mlton/blob/master/mlton/ssa/common-arg.fun"><span class="monospaced">common-arg.fun</span></a>
</p>
</li>
</ul></div>
</div>
</div>
<div class="sect1">
<h2 id="_details_and_notes">Details and Notes</h2>
<div class="sectionbody">
<div class="paragraph"><p>Three analyses were originally proposed to drive the optimization
transformation.  Only the <em>Dominator Analysis</em> is currently
implemented.  (Implementations of the other analyses are available in
the <a href="Sources">repository history</a>.)</p></div>
<div class="sect2">
<h3 id="_syntactic_analysis">Syntactic Analysis</h3>
<div class="paragraph"><p>The simplest analysis I could think of maintains</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>varInfo: Var.t -&gt; Var.t option list ref</pre>
</div></div>
<div class="paragraph"><p>initialized to <span class="monospaced">[]</span>.</p></div>
<div class="ulist"><ul>
<li>
<p>
For each variable <span class="monospaced">v</span> bound in a <span class="monospaced">Statement.t</span> or in the
<span class="monospaced">Function.t</span> args, then <span class="monospaced">List.push(varInfo v, NONE)</span>.
</p>
</li>
<li>
<p>
For each <span class="monospaced">L (x1, ..., xn)</span> transfer where <span class="monospaced">(a1, ..., an)</span> are the
formals of <span class="monospaced">L</span>, then <span class="monospaced">List.push(varInfo ai, SOME xi)</span>.
</p>
</li>
<li>
<p>
For each block argument a used in an unknown context (e.g.,
arguments of blocks used as continuations, handlers, arith success,
runtime return, or case switch labels), then
<span class="monospaced">List.push(varInfo a, NONE)</span>.
</p>
</li>
</ul></div>
<div class="paragraph"><p>Now, any block argument <span class="monospaced">a</span> such that <span class="monospaced">varInfo a = xs</span>, where all of
the elements of <span class="monospaced">xs</span> are equal to <span class="monospaced">SOME x</span>, can be optimized by
setting <span class="monospaced">a = x</span> at the beginning of the block and dropping the
argument from <span class="monospaced">Goto</span> transfers.</p></div>
<div class="paragraph"><p>That takes care of the example above.  We can clearly do slightly
better, by changing the transformation criteria to the following: any
block argument a such that <span class="monospaced">varInfo a = xs</span>, where all of the elements
of <span class="monospaced">xs</span> are equal to <span class="monospaced">SOME x</span> <em>or</em> are equal to <span class="monospaced">SOME a</span>, can be
optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
dropping the argument from <span class="monospaced">Goto</span> transfers.  This optimizes a case
like:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>L_1 ()
  ... z1 = ? ...
  L_3 (x, y, z1)
L_2 ()
  ... z2 = ? ...
  L_3(x, y, z2)
L_3 (a, b, c)
  ... w = ? ...
  case w of
    true =&gt; L_4 | false =&gt; L_5
L_4 ()
   ...
   L_3 (a, b, w)
L_5 ()
   ...</pre>
</div></div>
<div class="paragraph"><p>where a common argument is passed to a loop (and is invariant through
the loop).  Of course, the <a href="LoopInvariant">LoopInvariant</a> optimization pass would
normally introduce a local loop and essentially reduce this to the
first example, but I have seen this in practice, which suggests that
some optimizations after <a href="LoopInvariant">LoopInvariant</a> do enough simplifications
to introduce (new) loop invariant arguments.</p></div>
</div>
<div class="sect2">
<h3 id="_fixpoint_analysis">Fixpoint Analysis</h3>
<div class="paragraph"><p>However, the above analysis and transformation doesn&#8217;t cover the cases
where eliminating one common argument exposes the opportunity to
eliminate other common arguments.  For example:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>L_1 ()
  ...
  L_3 (x)
L_2 ()
  ...
  L_3 (x)
L_3 (a)
  ...
  L_5 (a)
L_4 ()
  ...
  L_5 (x)
L_5 (b)
  ...</pre>
</div></div>
<div class="paragraph"><p>One pass of analysis and transformation would eliminate the argument
to <span class="monospaced">L_3</span> and rewrite the <span class="monospaced">L_5(a)</span> transfer to <span class="monospaced">L_5 (x)</span>, thereby
exposing the opportunity to eliminate the common argument to <span class="monospaced">L_5</span>.</p></div>
<div class="paragraph"><p>The interdependency the arguments to <span class="monospaced">L_3</span> and <span class="monospaced">L_5</span> suggest
performing some sort of fixed-point analysis.  This analysis is
relatively simple; maintain</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>varInfo: Var.t -&gt; VarLattice.t</pre>
</div></div>
<div class="paragraph"><p>where</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>VarLattice.t ~=~ Bot | Point of Var.t | Top</pre>
</div></div>
<div class="paragraph"><p>(but is implemented by the <a href="FlatLattice">FlatLattice</a> functor with a <span class="monospaced">lessThan</span>
list and <span class="monospaced">value ref</span> under the hood), initialized to <span class="monospaced">Bot</span>.</p></div>
<div class="ulist"><ul>
<li>
<p>
For each variable <span class="monospaced">v</span> bound in a <span class="monospaced">Statement.t</span> or in the
<span class="monospaced">Function.t</span> args, then <span class="monospaced">VarLattice.&lt;= (Point v, varInfo v)</span>
</p>
</li>
<li>
<p>
For each <span class="monospaced">L (x1, ..., xn)</span> transfer where <span class="monospaced">(a1, ..., an)</span> are the
formals of <span class="monospaced">L</span>}, then <span class="monospaced">VarLattice.&lt;= (varInfo xi, varInfo ai)</span>.
</p>
</li>
<li>
<p>
For each block argument a used in an unknown context, then
<span class="monospaced">VarLattice.&lt;= (Point a, varInfo a)</span>.
</p>
</li>
</ul></div>
<div class="paragraph"><p>Now, any block argument a such that <span class="monospaced">varInfo a = Point x</span> can be
optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
dropping the argument from <span class="monospaced">Goto</span> transfers.</p></div>
<div class="paragraph"><p>Now, with the last example, we introduce the ordering constraints:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>varInfo x &lt;= varInfo a
varInfo a &lt;= varInfo b
varInfo x &lt;= varInfo b</pre>
</div></div>
<div class="paragraph"><p>Assuming that <span class="monospaced">varInfo x = Point x</span>, then we get <span class="monospaced">varInfo a = Point x</span>
and <span class="monospaced">varInfo b = Point x</span>, and we optimize the example as desired.</p></div>
<div class="paragraph"><p>But, that is a rather weak assumption.  It&#8217;s quite possible for
<span class="monospaced">varInfo x = Top</span>.  For example, consider:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>G_1 ()
  ... n = 1 ...
  L_0 (n)
G_2 ()
  ... m = 2 ...
  L_0 (m)
L_0 (x)
  ...
L_1 ()
  ...
  L_3 (x)
L_2 ()
  ...
  L_3 (x)
L_3 (a)
  ...
  L_5(a)
L_4 ()
  ...
  L_5(x)
L_5 (b)
   ...</pre>
</div></div>
<div class="paragraph"><p>Now <span class="monospaced">varInfo x = varInfo a = varInfo b = Top</span>.  What went wrong here?
When <span class="monospaced">varInfo x</span> went to <span class="monospaced">Top</span>, it got propagated all the way through
to <span class="monospaced">a</span> and <span class="monospaced">b</span>, and prevented the elimination of any common arguments.
What we&#8217;d like to do instead is when <span class="monospaced">varInfo x</span> goes to <span class="monospaced">Top</span>,
propagate on <span class="monospaced">Point x</span>&#8201;&#8212;&#8201;we have no hope of eliminating <span class="monospaced">x</span>, but if
we hold <span class="monospaced">x</span> constant, then we have a chance of eliminating arguments
for which <span class="monospaced">x</span> is passed as an actual.</p></div>
</div>
<div class="sect2">
<h3 id="_dominator_analysis">Dominator Analysis</h3>
<div class="paragraph"><p>Does anyone see where this is going yet?  Pausing for a little
thought, <a href="MatthewFluet">MatthewFluet</a> realized that he had once before tried
proposing this kind of "fix" to a fixed-point analysis&#8201;&#8212;&#8201;when we were
first investigating the <a href="Contify">Contify</a> optimization in light of John
Reppy&#8217;s CWS paper.  Of course, that "fix" failed because it defined a
non-monotonic function and one couldn&#8217;t take the fixed point.  But,
<a href="StephenWeeks">StephenWeeks</a> suggested a dominator based approach, and we were
able to show that, indeed, the dominator analysis subsumed both the
previous call based analysis and the cont based analysis.  And, a
moment&#8217;s reflection reveals further parallels: when
<span class="monospaced">varInfo: Var.t -&gt; Var.t option list ref</span>, we have something analogous
to the call analysis, and when <span class="monospaced">varInfo: Var.t -&gt; VarLattice.t</span>, we
have something analogous to the cont analysis.  Maybe there is
something analogous to the dominator approach (and therefore superior
to the previous analyses).</p></div>
<div class="paragraph"><p>And this turns out to be the case.  Construct the graph <span class="monospaced">G</span> as follows:</p></div>
<div class="listingblock">
<div class="content monospaced">
<pre>nodes(G) = {Root} U Var.t
edges(G) = {Root -&gt; v | v bound in a Statement.t or
                                in the Function.t args} U
           {xi -&gt; ai | L(x1, ..., xn) transfer where (a1, ..., an)
                                      are the formals of L} U
           {Root -&gt; a | a is a block argument used in an unknown context}</pre>
</div></div>
<div class="paragraph"><p>Let <span class="monospaced">idom(x)</span> be the immediate dominator of <span class="monospaced">x</span> in <span class="monospaced">G</span> with root
<span class="monospaced">Root</span>.  Now, any block argument a such that <span class="monospaced">idom(a) = x &lt;&gt; Root</span> can
be optimized by setting <span class="monospaced">a = x</span> at the beginning of the block and
dropping the argument from <span class="monospaced">Goto</span> transfers.</p></div>
<div class="paragraph"><p>Furthermore, experimental evidence suggests (and we are confident that
a formal presentation could prove) that the dominator analysis
subsumes the "syntactic" and "fixpoint" based analyses in this context
as well and that the dominator analysis gets "everything" in one go.</p></div>
</div>
<div class="sect2">
<h3 id="_final_thoughts">Final Thoughts</h3>
<div class="paragraph"><p>I must admit, I was rather surprised at this progression and final
result.  At the outset, I never would have thought of a connection
between <a href="Contify">Contify</a> and <a href="CommonArg">CommonArg</a> optimizations.  They would seem
to be two completely different optimizations.  Although, this may not
really be the case.  As one of the reviewers of the ICFP paper said:</p></div>
<div class="quoteblock">
<div class="content">
<div class="paragraph"><p>I understand that such a form of CPS might be convenient in some
cases, but when we&#8217;re talking about analyzing code to detect that some
continuation is constant, I think it makes a lot more sense to make
all the continuation arguments completely explicit.</p></div>
<div class="paragraph"><p>I believe that making all the continuation arguments explicit will
show that the optimization can be generalized to eliminating constant
arguments, whether continuations or not.</p></div>
</div>
<div class="attribution">
</div></div>
<div class="paragraph"><p>What I think the common argument optimization shows is that the
dominator analysis does slightly better than the reviewer puts it: we
find more than just constant continuations, we find common
continuations.  And I think this is further justified by the fact that
I have observed common argument eliminate some <span class="monospaced">env_X</span> arguments which
would appear to correspond to determining that while the closure being
executed isn&#8217;t constant it is at least the same as the closure being
passed elsewhere.</p></div>
<div class="paragraph"><p>At first, I was curious whether or not we had missed a bigger picture
with the dominator analysis.  When we wrote the contification paper, I
assumed that the dominator analysis was a specialized solution to a
specialized problem; we never suggested that it was a technique suited
to a larger class of analyses.  After initially finding a connection
between <a href="Contify">Contify</a> and <a href="CommonArg">CommonArg</a> (and thinking that the only
connection was the technique), I wondered if the dominator technique
really was applicable to a larger class of analyses.  That is still a
question, but after writing up the above, I&#8217;m suspecting that the
"real story" is that the dominator analysis is a solution to the
common argument optimization, and that the <a href="Contify">Contify</a> optimization is
specializing <a href="CommonArg">CommonArg</a> to the case of continuation arguments (with
a different transformation at the end).  (Note, a whole-program,
inter-procedural common argument analysis doesn&#8217;t really make sense
(in our <a href="SSA">SSA</a> <a href="IntermediateLanguage">IntermediateLanguage</a>), because the only way of
passing values between functions is as arguments.  (Unless of course
in the case that the common argument is also a constant argument, in
which case <a href="ConstantPropagation">ConstantPropagation</a> could lift it to a global.)  The
inter-procedural <a href="Contify">Contify</a> optimization works out because there we
move the function to the argument.)</p></div>
<div class="paragraph"><p>Anyways, it&#8217;s still unclear to me whether or not the dominator based
approach solves other kinds of problems.</p></div>
</div>
<div class="sect2">
<h3 id="_phase_ordering">Phase Ordering</h3>
<div class="paragraph"><p>On the downside, the optimization doesn&#8217;t have a huge impact on
runtime, although it does predictably saved some code size.  I stuck
it in the optimization sequence after <a href="Flatten">Flatten</a> and (the third round
of) <a href="LocalFlatten">LocalFlatten</a>, since it seems to me that we could have cases
where some components of a tuple used as an argument are common, but
the whole tuple isn&#8217;t.  I think it makes sense to add it after
<a href="IntroduceLoops">IntroduceLoops</a> and <a href="LoopInvariant">LoopInvariant</a> (even though <a href="CommonArg">CommonArg</a>
get some things that <a href="LoopInvariant">LoopInvariant</a> gets, it doesn&#8217;t get all of
them).  I also think that it makes sense to add it before
<a href="CommonSubexp">CommonSubexp</a>, since identifying variables could expose more common
subexpressions.  I would think a similar thought applies to
<a href="RedundantTests">RedundantTests</a>.</p></div>
</div>
</div>
</div>
</div>
<div id="footnotes"><hr></div>
<div id="footer">
<div id="footer-text">
</div>
<div id="footer-badges">
</div>
</div>
</body>
</html>
