Table of Contents
Parallelization is evaluation of program parts in multiple threads, in parallel manner. As each Wafl program represents an expression, the parallelization assumes the segmentation of the program in many sub-expressions, which evaluate in parallel.
Only the pure functional expressions can be parallelized. If a non-pure expression were to be parallelized, its result or its side-effects could differ in different evaluations.
The current Wafl implementation does not check if the subject of the parallelization is a pure functional expression or not. So developers need to take care of that.
Wafl supports explicit and implicit parallelization.
Explicit parallelization is based on the explicit use of parallel versions of some Wafl library functions. These functions are implemented to split the job in parts and to use many threads to complete the evaluation.
Explicit parallel functions include:
map_par
- A parallel version of map
function.mapIdx_par
- A parallel version of mapIdx
function.filter_par
- A parallel version of filter
function.fold_par
- A parallel version of fold
function.newArrayFn_par
- A parallel version of
newArrayFn
function.Each of the parallel versions of the functions has the same type and returns the same result as the sequential version of the functions, but evaluates in many parallel threads.
Implicit parallelization assumes that some parts of the programs will be implicitly (automatically) parallelized by interpreter. An important module of Wafl interpreter is a parallelization engine. Its job is to analyze the available data on the expressions and to decide if an expression is to be evaluated in a sequential or in a parallel way. Moreover, it decides on many of the parallelization options.
In a simplified way, we may say that the parallelization engine tries to evaluate the expected benefits of parallelization, taking into account:
Based on the evaluated benefits, the engine compares it to a threshold and decides to use the parallelization or not. The threshold may be configured.
By default, the implicit parallelization is disabled. The main reason for this is to prevent the errors in the cases of implicit parallelization of non-pure expressions.
In the future, after interpreter is updated to check on the expression pureness, the implicit parallelization may be enabled by default.
The following functions may be implicitly parallelizable:
map
mapIdx
fold
filter
newArrayFn
.If any specific instance of these functions will be parallelized or not, will be decided by the parallelization engine. Currently, the parallelization engine decides based on the size of the data collection to be processed, and on the number of currently active parallel jobs.
In the future the engine will use the expression complexity as well.
Sometimes it is useful to have implicit parallelization, but to suggest to the parallelization engine that a part of the code should be evaluated in parallel, if possible. Such suggestion has the effect of lowering the threshold for the decision on parallelization.
To suggest that a program segment should be parallelized, there is a
function parallel
. It’s type is ('1 -> '1)
and its semantics is almost like identity function \x:x
.
The only difference is that the parallelization threshold is
lowered.
It can be applied to any expression, but has different effects if applied on functions.
In general case, when the bound expression is not a simple function
reference, like in parallel(map(lst,fn))
, the following
semantics is applied:
So, the main difference is in whether the arguments will evaluate with parallelization turned on or not.
On the other side, if the argument is a function, like in
parallel(map)(lst,fn)
, the parallelization suggestion is
related strictly to the bound function. The main difference is that
parallelization is not related to the arguments. We could describe the
semantics as:
If the specified function is a library function, like
map
, the result is almost the same as using a specialized
parallel map implementation instead. On the other side, if the specified
function is a user defined function, then it is the same as if the user
defined function is defined with parallel
applied to its
definition
For example, if we have:
f(x,y,z) = ...defexp...;
then we can suggest its parallelization in two ways:
if we want to parallelize a selected application only, then we
should use parallel(f)(a,b,c)
, and
if want to parallelize all function applications, then we should
use parallel
on the function definition expression:
f(x,y,z) = parallel(...defexp...);
The parallelization engine configuration can be set by command line options:
-pd
, -parallel-disable
-pa
, -parallel-auto
-ps
, -parallel
-pa
, but this mode is biased towards parallelization, by
using a lower threshold.-pf
, -parallel-force
If implicit parallelization is used, it is possible that the parallelization engine decides to parallelize a part of the program that must not be parallelized (usually because it is not purely functional). To prevent this, there are explicit sequential versions of functions that prevent implicit parallelization.
There are sequential versions of each of the implicitly parallelizable functions:
map_seq
- A sequential version of map
function.mapIdx_seq
- A sequential version of
mapIdx
function.filter_seq
- A sequential version of
filter
function.fold_seq
- A sequential version of fold
function.
foldl
and
foldr
are always sequential, so fold_seq
represents a special case of foldl
and foldr
,
where an associative function is used. If the applied function is
associative, then fold_seq
evaluates the same result as
foldl
and foldr
.newArrayFn_seq
- A sequential version of
newArrayFn
function.Not every sequence folding can be parallelized. For example, consider the following expression:
1..20).foldr(
(\x,l: if x%2=0 then x:l else l,
[]
)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
The expression folds an integer sequence by evaluating a filtered
list of even integers. (Of course, we can do than in a simpler way,
using filter
.) This is a folding example, which cannot be
easily implemented using foldl
with the same arguments,
because foldl
expects its first argument to be an integer
and the second argument to be a list of integers.
In a general case, to be able to use both foldl
and
foldr
, and with the same result, two important requirements
must hold:
('1 * '1 -> '1)
, i.e. to represent a binary operator on
a domain; anda
, b
and c
the following must
hold: f(a,f(b,c)) == f(f(a,b),c)
.If we want the folding to evaluate in parallel, then there is even one additional requirement:
zero
value must really be a neutral value
for the given operation, because it may be used multiple times.The following folding functions are provided to be used in such cases:
Function / Type and Description
fold
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Sequence folding by an associative function:
fold == foldl == foldr
fold_seq
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Non-parallel folding of sequence by an associative function:
fold_seq == foldl == foldr
fold_par
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Folding of sequence elements in parallel. [‘fn’ must be
associative]: [‘zero’ must be ZERO, because it can be used many times]:
fold_par(seq,fn,zero) == foldl(seq,fn,zero) == foldr(seq,fn,zero)
These functions assume that the folding is performed by a given
binary operator of type ('1 * '1 -> '1)
, which is an
associative operator, return the same result as foldl
.
Function fold_par
requires the zero
to be a
neutral value for the operator, also, and then returns the same
result.
However, the interpreter does not check if the operator is
associative or not, and if the specified zero
is really
the zero for the operator, so please use these functions with
care.
The main function here is fold
. It may be implicitly
parallelized. However, if the parallelization is required, then
fold_par
should be used, and if the parallelization is to
be prevented, then fold_seq
should be used.
In the following examples, the folding is performed using operator which returns the sum of the arguments, but outputs the arguments to the console:
1..50).fold_seq(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 =
1275
1..50).fold(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 =
1275
1..50).fold_par(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n') ).
0 + 1 0 + 11 + 2 + 3 + 4 + 5 + 6 + 7 0 + 31 + 8 + 9 + 10 0 + 21 + 22 + 23 + 24 + 25 + 26 0 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 27 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 28 + 29 + 30 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 0 + 55 + 155 + 255 + 355 + 455 =
1275
The result is always the same, but in the case of the parallel evaluation, the order of the operations can be different. Additionally, after each thread folds its part of the sequence, the threads’ results are folded. The zero is used one time more than the number of threads.
There is a special function parallel(exp)
. Its purpose
is to suggest the interpreter that the given expression should be
evaluated in parallel. It does not force the parallel evaluation, but
lowers down the threshold for the parallelization of the given
expression.
It has two versions:
In the following examples the same expression is evaluated three times. First example is as usual, with a default implicit parallelization:
1..100).fold(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 =
5050
In the next example, using parallel(fold)
, we suggest
that the function fold
should evaluate in parallel:
1..100).(parallel(fold))(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n') ).
0 + 1 0 + 11 + 2 + 3 + 4 + 5 0 + 21 + 12 + 13 + 14 + 15 + 16 + 17 0 + 31 0 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 18 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 19 + 20 0 + 61 + 6 + 62 0 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 63 0 + 91 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 92 0 + 41 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 0 + 71 + 42 + 72 + 43 + 44 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 7 + 45 + 8 + 9 + 10 + 46 + 47 + 48 + 49 + 50 0 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955 =
5050
And finally, by applying parallel
to an expression we
suggest that the complete expression should evaluate in parallel:
1..100).fold(
(\x,y: x + y.echoTxt(
if x=0 then x.asString() + ' ' else '')
(+ "+ " + y.asString() + ' '),
0
echoTxt(' =\n').parallel() ).
0 + 11 0 + 51 + 12 0 + 91 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 0 + 1 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 0 + 21 0 + 41 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 0 + 71 0 + 61 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 2 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 0 + 31 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 0 + 81 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 82 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 0 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955 =
5050
The implicit parallelization depends on many properties of the specific interpreter implementation and specific platform. Thus, it is not always possible to repeat the given examples with the same level of parallelization. Please try to change the list size to experience the different results.