collective.qbk
11.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
[section:collectives Collective operations]
[link mpi.tutorial.point_to_point Point-to-point operations] are the
core message passing primitives in Boost.MPI. However, many
message-passing applications also require higher-level communication
algorithms that combine or summarize the data stored on many different
processes. These algorithms support many common tasks such as
"broadcast this value to all processes", "compute the sum of the
values on all processors" or "find the global minimum."
[section:broadcast Broadcast]
The [funcref boost::mpi::broadcast `broadcast`] algorithm is
by far the simplest collective operation. It broadcasts a value from a
single process to all other processes within a [classref
boost::mpi::communicator communicator]. For instance, the
following program broadcasts "Hello, World!" from process 0 to every
other process. (`hello_world_broadcast.cpp`)
#include <boost/mpi.hpp>
#include <iostream>
#include <string>
#include <boost/serialization/string.hpp>
namespace mpi = boost::mpi;
int main()
{
mpi::environment env;
mpi::communicator world;
std::string value;
if (world.rank() == 0) {
value = "Hello, World!";
}
broadcast(world, value, 0);
std::cout << "Process #" << world.rank() << " says " << value
<< std::endl;
return 0;
}
Running this program with seven processes will produce a result such
as:
[pre
Process #0 says Hello, World!
Process #2 says Hello, World!
Process #1 says Hello, World!
Process #4 says Hello, World!
Process #3 says Hello, World!
Process #5 says Hello, World!
Process #6 says Hello, World!
]
[endsect:broadcast]
[section:gather Gather]
The [funcref boost::mpi::gather `gather`] collective gathers
the values produced by every process in a communicator into a vector
of values on the "root" process (specified by an argument to
`gather`). The /i/th element in the vector will correspond to the
value gathered from the /i/th process. For instance, in the following
program each process computes its own random number. All of these
random numbers are gathered at process 0 (the "root" in this case),
which prints out the values that correspond to each processor.
(`random_gather.cpp`)
#include <boost/mpi.hpp>
#include <iostream>
#include <vector>
#include <cstdlib>
namespace mpi = boost::mpi;
int main()
{
mpi::environment env;
mpi::communicator world;
std::srand(time(0) + world.rank());
int my_number = std::rand();
if (world.rank() == 0) {
std::vector<int> all_numbers;
gather(world, my_number, all_numbers, 0);
for (int proc = 0; proc < world.size(); ++proc)
std::cout << "Process #" << proc << " thought of "
<< all_numbers[proc] << std::endl;
} else {
gather(world, my_number, 0);
}
return 0;
}
Executing this program with seven processes will result in output such
as the following. Although the random values will change from one run
to the next, the order of the processes in the output will remain the
same because only process 0 writes to `std::cout`.
[pre
Process #0 thought of 332199874
Process #1 thought of 20145617
Process #2 thought of 1862420122
Process #3 thought of 480422940
Process #4 thought of 1253380219
Process #5 thought of 949458815
Process #6 thought of 650073868
]
The `gather` operation collects values from every process into a
vector at one process. If instead the values from every process need
to be collected into identical vectors on every process, use the
[funcref boost::mpi::all_gather `all_gather`] algorithm,
which is semantically equivalent to calling `gather` followed by a
`broadcast` of the resulting vector.
[endsect:gather]
[section:scatter Scatter]
The [funcref boost::mpi::scatter `scatter`] collective scatters
the values from a vector in the "root" process in a communicator into
values in all the processes of the communicator.
The /i/th element in the vector will correspond to the
value received by the /i/th process. For instance, in the following
program, the root process produces a vector of random nomber and send
one value to each process that will print it. (`random_scatter.cpp`)
#include <boost/mpi.hpp>
#include <boost/mpi/collectives.hpp>
#include <iostream>
#include <cstdlib>
#include <vector>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
std::srand(time(0) + world.rank());
std::vector<int> all;
int mine = -1;
if (world.rank() == 0) {
all.resize(world.size());
std::generate(all.begin(), all.end(), std::rand);
}
mpi::scatter(world, all, mine, 0);
for (int r = 0; r < world.size(); ++r) {
world.barrier();
if (r == world.rank()) {
std::cout << "Rank " << r << " got " << mine << '\n';
}
}
return 0;
}
Executing this program with seven processes will result in output such
as the following. Although the random values will change from one run
to the next, the order of the processes in the output will remain the
same because of the barrier.
[pre
Rank 0 got 1409381269
Rank 1 got 17045268
Rank 2 got 440120016
Rank 3 got 936998224
Rank 4 got 1827129182
Rank 5 got 1951746047
Rank 6 got 2117359639
]
[endsect:scatter]
[section:reduce Reduce]
The [funcref boost::mpi::reduce `reduce`] collective
summarizes the values from each process into a single value at the
user-specified "root" process. The Boost.MPI `reduce` operation is
similar in spirit to the STL _accumulate_ operation, because it takes
a sequence of values (one per process) and combines them via a
function object. For instance, we can randomly generate values in each
process and the compute the minimum value over all processes via a
call to [funcref boost::mpi::reduce `reduce`]
(`random_min.cpp`):
#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>
namespace mpi = boost::mpi;
int main()
{
mpi::environment env;
mpi::communicator world;
std::srand(time(0) + world.rank());
int my_number = std::rand();
if (world.rank() == 0) {
int minimum;
reduce(world, my_number, minimum, mpi::minimum<int>(), 0);
std::cout << "The minimum value is " << minimum << std::endl;
} else {
reduce(world, my_number, mpi::minimum<int>(), 0);
}
return 0;
}
The use of `mpi::minimum<int>` indicates that the minimum value
should be computed. `mpi::minimum<int>` is a binary function object
that compares its two parameters via `<` and returns the smaller
value. Any associative binary function or function object will
work provided it's stateless. For instance, to concatenate strings with `reduce` one could use
the function object `std::plus<std::string>` (`string_cat.cpp`):
#include <boost/mpi.hpp>
#include <iostream>
#include <string>
#include <functional>
#include <boost/serialization/string.hpp>
namespace mpi = boost::mpi;
int main()
{
mpi::environment env;
mpi::communicator world;
std::string names[10] = { "zero ", "one ", "two ", "three ",
"four ", "five ", "six ", "seven ",
"eight ", "nine " };
std::string result;
reduce(world,
world.rank() < 10? names[world.rank()]
: std::string("many "),
result, std::plus<std::string>(), 0);
if (world.rank() == 0)
std::cout << "The result is " << result << std::endl;
return 0;
}
In this example, we compute a string for each process and then perform
a reduction that concatenates all of the strings together into one,
long string. Executing this program with seven processors yields the
following output:
[pre
The result is zero one two three four five six
]
[h4 Binary operations for reduce]
Any kind of binary function objects can be used with `reduce`. For
instance, and there are many such function objects in the C++ standard
`<functional>` header and the Boost.MPI header
`<boost/mpi/operations.hpp>`. Or, you can create your own
function object. Function objects used with `reduce` must be
associative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y),
z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`),
Boost.MPI can use a more efficient implementation of `reduce`. To
state that a function object is commutative, you will need to
specialize the class [classref boost::mpi::is_commutative
`is_commutative`]. For instance, we could modify the previous example
by telling Boost.MPI that string concatenation is commutative:
namespace boost { namespace mpi {
template<>
struct is_commutative<std::plus<std::string>, std::string>
: mpl::true_ { };
} } // end namespace boost::mpi
By adding this code prior to `main()`, Boost.MPI will assume that
string concatenation is commutative and employ a different parallel
algorithm for the `reduce` operation. Using this algorithm, the
program outputs the following when run with seven processes:
[pre
The result is zero one four five six two three
]
Note how the numbers in the resulting string are in a different order:
this is a direct result of Boost.MPI reordering operations. The result
in this case differed from the non-commutative result because string
concatenation is not commutative: `f("x", "y")` is not the same as
`f("y", "x")`, because argument order matters. For truly commutative
operations (e.g., integer addition), the more efficient commutative
algorithm will produce the same result as the non-commutative
algorithm. Boost.MPI also performs direct mappings from function
objects in `<functional>` to `MPI_Op` values predefined by MPI (e.g.,
`MPI_SUM`, `MPI_MAX`); if you have your own function objects that can
take advantage of this mapping, see the class template [classref
boost::mpi::is_mpi_op `is_mpi_op`].
[warning Due to the underlying MPI limitations, it is important to note that the operation must be stateless.]
[h4 All process variant]
Like [link mpi.tutorial.collectives.gather `gather`], `reduce` has an "all"
variant called [funcref boost::mpi::all_reduce `all_reduce`]
that performs the reduction operation and broadcasts the result to all
processes. This variant is useful, for instance, in establishing
global minimum or maximum values.
The following code (`global_min.cpp`) shows a broadcasting version of
the `random_min.cpp` example:
#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
std::srand(world.rank());
int my_number = std::rand();
int minimum;
mpi::all_reduce(world, my_number, minimum, mpi::minimum<int>());
if (world.rank() == 0) {
std::cout << "The minimum value is " << minimum << std::endl;
}
return 0;
}
In that example we provide both input and output values, requiring
twice as much space, which can be a problem depending on the size
of the transmitted data.
If there is no need to preserve the input value, the output value
can be omitted. In that case the input value will be overridden with
the output value and Boost.MPI is able, in some situation, to implement
the operation with a more space efficient solution (using the `MPI_IN_PLACE`
flag of the MPI C mapping), as in the following example (`in_place_global_min.cpp`):
#include <boost/mpi.hpp>
#include <iostream>
#include <cstdlib>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
std::srand(world.rank());
int my_number = std::rand();
mpi::all_reduce(world, my_number, mpi::minimum<int>());
if (world.rank() == 0) {
std::cout << "The minimum value is " << my_number << std::endl;
}
return 0;
}
[endsect:reduce]
[endsect:collectives]