codegen.qbk
3.96 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
[section Object Code]
Let's look at some assembly. All assembly here was produced with Clang 4.0
with `-O3`. Given these definitions:
[arithmetic_perf_decls]
Here is a _yap_-based arithmetic function:
[arithmetic_perf_eval_as_yap_expr]
and the assembly it produces:
arithmetic_perf[0x100001c00] <+0>: pushq %rbp
arithmetic_perf[0x100001c01] <+1>: movq %rsp, %rbp
arithmetic_perf[0x100001c04] <+4>: mulsd %xmm1, %xmm0
arithmetic_perf[0x100001c08] <+8>: addsd %xmm2, %xmm0
arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001c10] <+16>: mulsd %xmm1, %xmm1
arithmetic_perf[0x100001c14] <+20>: addsd %xmm0, %xmm1
arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001c1c] <+28>: popq %rbp
arithmetic_perf[0x100001c1d] <+29>: retq
And for the equivalent function using builtin expressions:
[arithmetic_perf_eval_as_cpp_expr]
the assembly is:
arithmetic_perf[0x100001e10] <+0>: pushq %rbp
arithmetic_perf[0x100001e11] <+1>: movq %rsp, %rbp
arithmetic_perf[0x100001e14] <+4>: mulsd %xmm1, %xmm0
arithmetic_perf[0x100001e18] <+8>: addsd %xmm2, %xmm0
arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
arithmetic_perf[0x100001e20] <+16>: mulsd %xmm1, %xmm1
arithmetic_perf[0x100001e24] <+20>: addsd %xmm0, %xmm1
arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
arithmetic_perf[0x100001e2c] <+28>: popq %rbp
arithmetic_perf[0x100001e2d] <+29>: retq
If we increase the number of terminals by a factor of four:
[arithmetic_perf_eval_as_yap_expr_4x]
the results are the same: in this simple case, the _yap_ and builtin
expressions result in the same object code.
However, increasing the number of terminals by an additional factor of 2.5
(for a total of 90 terminals), the inliner can no longer do as well for _yap_
expressions as for builtin ones.
More complex nonarithmetic code produces more mixed results. For example, here
is a function using code from the Map Assign example:
std::map<std::string, int> make_map_with_boost_yap ()
{
return map_list_of
("<", 1)
("<=",2)
(">", 3)
(">=",4)
("=", 5)
("<>",6)
;
}
By contrast, here is the Boost.Assign version of the same function:
std::map<std::string, int> make_map_with_boost_assign ()
{
return boost::assign::map_list_of
("<", 1)
("<=",2)
(">", 3)
(">=",4)
("=", 5)
("<>",6)
;
}
Here is how you might do it "manually":
std::map<std::string, int> make_map_manually ()
{
std::map<std::string, int> retval;
retval.emplace("<", 1);
retval.emplace("<=",2);
retval.emplace(">", 3);
retval.emplace(">=",4);
retval.emplace("=", 5);
retval.emplace("<>",6);
return retval;
}
Finally, here is the same map created from an initializer list:
std::map<std::string, int> make_map_inializer_list ()
{
std::map<std::string, int> retval = {
{"<", 1},
{"<=",2},
{">", 3},
{">=",4},
{"=", 5},
{"<>",6}
};
return retval;
}
All of these produce roughly the same amount of assembly instructions.
Benchmarking these four functions with Google Benchmark yields these results:
[table Runtimes of Different Map Constructions
[[Function] [Time (ns)]]
[[make_map_with_boost_yap()] [1285]]
[[make_map_with_boost_assign()] [1459]]
[[make_map_manually()] [985]]
[[make_map_inializer_list()] [954]]
]
The _yap_-based implementation finishes in the middle of the pack.
In general, the expression trees produced by _yap_ get evaluated down to
something close to the hand-written equivalent. There is an abstraction
penalty, but it is small for reasonably-sized expressions.
[endsect]