codegen.qbk 3.96 KB
[section Object Code]

Let's look at some assembly.  All assembly here was produced with Clang 4.0
with `-O3`.  Given these definitions:

[arithmetic_perf_decls]

Here is a _yap_-based arithmetic function:

[arithmetic_perf_eval_as_yap_expr]

and the assembly it produces:

    arithmetic_perf[0x100001c00] <+0>:  pushq  %rbp
    arithmetic_perf[0x100001c01] <+1>:  movq   %rsp, %rbp
    arithmetic_perf[0x100001c04] <+4>:  mulsd  %xmm1, %xmm0
    arithmetic_perf[0x100001c08] <+8>:  addsd  %xmm2, %xmm0
    arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
    arithmetic_perf[0x100001c10] <+16>: mulsd  %xmm1, %xmm1
    arithmetic_perf[0x100001c14] <+20>: addsd  %xmm0, %xmm1
    arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
    arithmetic_perf[0x100001c1c] <+28>: popq   %rbp
    arithmetic_perf[0x100001c1d] <+29>: retq   

And for the equivalent function using builtin expressions:

[arithmetic_perf_eval_as_cpp_expr]

the assembly is:

    arithmetic_perf[0x100001e10] <+0>:  pushq  %rbp
    arithmetic_perf[0x100001e11] <+1>:  movq   %rsp, %rbp
    arithmetic_perf[0x100001e14] <+4>:  mulsd  %xmm1, %xmm0
    arithmetic_perf[0x100001e18] <+8>:  addsd  %xmm2, %xmm0
    arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
    arithmetic_perf[0x100001e20] <+16>: mulsd  %xmm1, %xmm1
    arithmetic_perf[0x100001e24] <+20>: addsd  %xmm0, %xmm1
    arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
    arithmetic_perf[0x100001e2c] <+28>: popq   %rbp
    arithmetic_perf[0x100001e2d] <+29>: retq   

If we increase the number of terminals by a factor of four:

[arithmetic_perf_eval_as_yap_expr_4x]

the results are the same: in this simple case, the _yap_ and builtin
expressions result in the same object code.

However, increasing the number of terminals by an additional factor of 2.5
(for a total of 90 terminals), the inliner can no longer do as well for _yap_
expressions as for builtin ones.

More complex nonarithmetic code produces more mixed results. For example, here
is a function using code from the Map Assign example:

    std::map<std::string, int> make_map_with_boost_yap ()
    {
        return map_list_of
            ("<", 1)
            ("<=",2)
            (">", 3)
            (">=",4)
            ("=", 5)
            ("<>",6)
            ;
    }

By contrast, here is the Boost.Assign version of the same function:

    std::map<std::string, int> make_map_with_boost_assign ()
    {
        return boost::assign::map_list_of
            ("<", 1)
            ("<=",2)
            (">", 3)
            (">=",4)
            ("=", 5)
            ("<>",6)
            ;
    }

Here is how you might do it "manually":

    std::map<std::string, int> make_map_manually ()
    {
        std::map<std::string, int> retval;
        retval.emplace("<", 1);
        retval.emplace("<=",2);
        retval.emplace(">", 3);
        retval.emplace(">=",4);
        retval.emplace("=", 5);
        retval.emplace("<>",6);
        return retval;
    }

Finally, here is the same map created from an initializer list:

    std::map<std::string, int> make_map_inializer_list ()
    {
        std::map<std::string, int> retval = {
            {"<", 1},
            {"<=",2},
            {">", 3},
            {">=",4},
            {"=", 5},
            {"<>",6}
        };
        return retval;
    }

All of these produce roughly the same amount of assembly instructions.
Benchmarking these four functions with Google Benchmark yields these results:

[table Runtimes of Different Map Constructions
    [[Function] [Time (ns)]]

    [[make_map_with_boost_yap()]          [1285]]
    [[make_map_with_boost_assign()]       [1459]]
    [[make_map_manually()]                 [985]]
    [[make_map_inializer_list()]           [954]]
]

The _yap_-based implementation finishes in the middle of the pack.

In general, the expression trees produced by _yap_ get evaluated down to
something close to the hand-written equivalent.  There is an abstraction
penalty, but it is small for reasonably-sized expressions.


[endsect]