Parallelization is the evaluation of program parts in several threads, i.e. in parallel. Since each Wafl program represents an expression, parallelization requires the program to be segmented into many sub-expressions that are evaluated in parallel.
Only pure functional expressions can be parallelized. If a non-pure expression is parallelized, its result or its side-effects could be different for different evaluations.
The current Wafl implementation does not check whether the object of parallelization is a pure functional expression or not. The developers must therefore take care of this.
Wafl supports explicit and implicit parallelization.
Explicit parallelization is based on the explicit use of parallel versions of some Wafl library functions. These functions are implemented to split the job into parts and to use many threads to complete the evaluation.
The explicit parallel functions include:
map_par
- A parallel version of the map
function.mapIdx_par
- A parallel version of the
mapIdx
function.filter_par
- A parallel version of the
filter
function.fold_par
- A parallel version of the fold
function.newArrayFn_par
- A parallel version of the
newArrayFn
function.Each of the parallel versions of the functions has the same type and returns the same result as the sequential version, but is evaluated in many parallel threads.
Implicit parallelization assumes that some parts of the programs are implicitly (automatically) parallelized by the interpreter. An important module of the Wafl interpreter is a parallelization engine. Its task is to analyze the available data about the expressions and decide whether an expression should be evaluated sequentially or in parallel. It also decides on many of the parallelization options.
Simplified, one can say that the parallelization engine tries to evaluate the expected benefits of parallelization, taking into account the following:
Based on the evaluated benefits, the engine compares them with a threshold and decides whether parallelization should be used or not. The threshold value can be configured.
By default, the implicit parallelization is disabled. The main reason for this is to avoid errors during the implicit parallelization of non-pure expressions.
In the future, after the interpreter is upgraded to check the purity of expressions, the implicit parallelization might be enabled by default.
The following functions can be implicitly parallelized:
map
mapIdx
fold
filter
newArrayFn
.Whether a particular instance of these functions is parallelized or not is decided by the parallelization engine. Currently, the parallelization engine decides based on the size of the data collection to be processed, and on the number of parallel jobs currently active.
Sometimes it is useful to have implicit parallelization, but to suggest to the parallelization engine that a part of the code should be evaluated in parallel, if possible. Such a suggestion has the effect of lowering the threshold for the parallelization decision.
To suggest that a program segment should be parallelized, there is a
function parallel
. Its type is ('1 -> '1)
and its semantics is almost like the identity function
\x:x
. The only difference is that the threshold for
parallelization is lowered.
It can be applied to any expression, but has different effects when applied to functions.
In the general case, if the bound expression is not a simple function
reference, as in parallel(map(lst,fn))
, the following
semantics are applied:
So the main difference is whether the arguments are evaluated with parallelization enabled or not.
On the other hand, if the argument is a function, as in
parallel(map)(lst,fn)
, the parallelization suggestion is
strictly bound to the given function. The main difference is that the
parallelization threshold for the function’s arguments is not lowered.
We could describe the semantics as follows:
If the specified function is a library function like
map
, the result is almost the same as using a specialized
parallel map implementation. On the other hand, if the specified
function is a user-defined function, then the result is the same as
defining the user-defined function with parallel
.
For example, if we have:
f(x,y,z) = ...defexp...;
then we can suggest its parallelization in two ways:
if we want to parallelize only a selected application, then we
should use parallel(f)(a,b,c)
, and
if we want to parallelize all function applications, we should
use parallel
for the entire expression of the function
definition:
f(x,y,z) = parallel(...defexp...);
An alternative is to define a new function parallel_f
as
follows:
parallel_f = parallel(f);
The configuration of the parallelization engine can be set via command line options:
-pd
, -parallel-disable
-pa
, -parallel-auto
-ps
, -parallel
-pa
, but this mode is biased towards parallelization by
using a lower threshold.-pf
, -parallel-force
When implicit parallelization is used, it is possible that the parallelization engine decides to parallelize a part of the program that is not allowed to be parallelized (usually because it is not purely functional). To prevent this, there are explicit sequential versions of functions that prevent implicit parallelization.
There are sequential versions of every implicitly parallelizable function:
map_seq
- A sequential version of the map
function.mapIdx_seq
- A sequential version of the
mapIdx
function.filter_seq
- A sequential version of the
filter
function.fold_seq
- A sequential version of the
fold
function.newArrayFn_seq
- A sequential version of the
newArrayFn
function.The functions foldl
and foldr
are always
sequential, therefore fold_seq
corresponds to the function
fold
. It is a special case of foldl
and
foldr
where an associative function is used.
The key word in parallelization is isolation. The prerequisite for parallel evaluation of some tasks is that they are well isolated from each other. In some concurrent environments, we expect the parallel tasks to communicate with each other to ensure a synchronization. However, in a functional context, especially for purely functional tasks, synchronization is usually not required. All we need is isolation of the tasks.
The main problem with task isolation is data sharing. It is very inconvenient if one task changes some data that the other task is processing at the same time. In a purely functional environment, we do not have such problems. Even in mixed environments, as is the case with Wafl, we usually do not have such problems. As long as we process the data in memory and do not change it, we can split the data into chunks and process the chunks in multiple threads quite safely and in full isolation.
In a few cases, problems can arise, and these are the cases we need to watch out for:
We have already discussed the functions that can work in parallel, as
well as theirs sequential versions and the versions that permit the
implicit parallelization. Most of them, with the exception of the
fold_par
function (and of course fold
),
normally perform some independent operations on independent data. Only
in the case of fold_par
the order of computation can be
important. In other cases, we can say that we have complete isolation
and safe parallelization as long as we do not create side effects. For
this reason, we pay additional attention to folding.
Not every sequence fold can be parallelized. For example, consider the following expression:
1..20).foldr(
(\x,l: if x%2=0 then x:l else l,
[]
)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
The expression folds an integer sequence by evaluating a filtered
list of even integers. (Of course, we can do this more simply with
filter
.) This is an example of fold that cannot be
implemented with foldl
with the same arguments, since
foldl
expects its first argument to be an integer and the
second argument to be a list of integers.
In order to be able to use both foldl
and
foldr
with the same result in the general case, two
important conditions must be met:
('1 * '1 -> '1)
, i.e. it must represent a binary
operator on a domain; anda
, b
and c
the following must
hold: f(a,f(b,c)) == f(f(a,b),c)
.If we want the fold to be evaluated in parallel, there is an additional condition:
zero
must really be a neutral value
for the given operation, as it can be used several times.The following functions are intended for such cases:
Function / Type and Description
fold
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Returns the sequence fold by an associative function:
fold == foldl == foldr
fold_seq
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Non-parallel fold of sequence by an associative function:
fold_seq == foldl == foldr
fold_par
(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Folds the sequence elements in parallel. [‘fn’ must be
associative]: [‘zero’ must be ZERO, because it can be used many times]:
fold_par(seq,fn,zero) == foldl(seq,fn,zero) == foldr(seq,fn,zero)
These functions assume that the fold is performed by a given binary
operator of type ('1 * '1 -> '1)
, which is an
associative operator, and return the same result as foldl
.
The function fold_par
assumes that the zero
value is a neutral value for the operator.
However, the interpreter does not check whether the operator is
associative or not and if the specified zero
value is
really the zero for the operator. So, please use these
functions with caution.
The base function here is fold
. It can be implicitly
parallelized. However, if parallelization is required, then
fold_par
should be used, and if parallelization is to be
prevented, then fold_seq
should be used.
In the following examples, the fold is performed with an operator that returns the sum of the arguments, but additionally outputs the arguments to the console:
1..50).fold_seq(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 =
1275
1..50).fold(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 =
1275
1..50).fold_par(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n') ).
0 + 10 + 11 + 2 + 12 + 3 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 200 + 21 + 22 + 23 + 240 + 41 + 42 + 43 + 44 + 45 + 25 + 26 + 27 + 28 + 29 + 30 + 46 + 4 + 47 + 48 + 49 + 500 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 5 + 6 + 7 + 8 + 9 + 100 + 55 + 155 + 255 + 355 + 455 =
1275
The result is always the same, but in the case of parallel evaluation, the order of operations may be different. After each thread has folded its part of the sequence, the results of the threads are also folded. The zero is used once more than the number of threads.
As already mentioned, we can use command line options to force
(-pf
) or prevent parallelization(-pd
). We can
also use the functions that always evaluate in parallel as
map_par
or sequentially as map_seq
. There are
two additional functions that control the implicit parallelization:
Function / Type and Description
parallel
('1 -> '1)
Marks an expression to be
parallelized, if possible.
sequential
('1 -> '1)
Force the expression to run
sequentially.
The function parallel(exp)
suggests to the interpreter
that the specified expression should be evaluated in parallel. It does
not enforce parallel evaluation, but lowers the threshold for
parallelization of the specified expression. There are two versions of
this function:
In the following examples, the same expression is evaluated three times. The first example is as usual, with standard implicit parallelization:
1..100).fold(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 =
5050
In the next example, we use parallel(fold)
to suggest
that the fold
function should be evaluated in parallel:
1..100).(parallel(fold))(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n') ).
0 + 1 + 2 + 3 + 40 + 31 + 5 + 6 + 70 + 21 + 8 + 9 + 10 + 320 + 610 + 51 + 52 + 53 + 54 + 55 + 22 + 56 + 57 + 58 + 59 + 600 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 500 + 110 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 900 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 + 12 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 300 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 33 + 62 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 63 + 64 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 65 + 66 + 67 + 68 + 69 + 700 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955 =
5050
And finally, by applying parallel
to an expression, we
suggest that the entire expression should be evaluated in parallel:
1..100).fold(
(\x,y: x + y
=> echoTxt( (if x=0 then '0 ' else ' ') + '+ ' + y$ ),
0
echoTxt(' =\n').parallel() ).
0 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 200 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 400 + 710 + 21 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 800 + 910 + 81 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 1000 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 300 + 510 + 1 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 600 + 41 + 2 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 100 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955 =
5050
Recall that <exp>.parallel()
is equivalent to
parallel(<exp>)
, so the parallel
function in the previous example was applied to the entire previous
expression.
Implicit parallelization depends on many properties of the specific interpreter implementation and the specific platform. Therefore, it is not always possible to repeat the given examples with the same degree of parallelization. The consequence is that it is not always possible to predict how an expression will be evaluated.
For this reason, we will not go into the results of the previous examples in detail. We expect that the first example was not evaluated in parallel and that the other two examples were evaluated in parallel, but we cannot be sure. Try changing the list size to see the different results.
The function sequential(exp)
is another function for
controlling the implicit parallelization. This function forces the
sequential evaluation of the given expression. During the evaluation of
the given expression, only the explicitly parallel functions such as
map_par
will work in parallel and all other computation
will be performed sequentially.
As with parallel
, there are two versions of
sequential
function:
The sequential
function is stronger than the
parallel
function. While parallel
suggests the
parallelization, sequential
forces sequential evaluation.
Any use of parallel
in expressions bound by
sequential
is ignored. On the other hand, the use of
sequential
in expressions bound by the
parallel
has the expected effect.
In the following examples, we compute (almost) the same thing in different ways. Some of the expressions will always work sequentially, some will always work in parallel and the others will depend on the program options:
{#'Always sequential:',
fold_seq.test(),
fold.sequential().test(), // sequential( fold )
fold.test().sequential(), // sequential(...exp...)
fold.test().sequential().parallel(),
fold.test().parallel().sequential(),
fold.sequential().parallel().test(),
fold.parallel().sequential().test(),
'Depending on the options:',
fold.test(),
fold.parallel().test(), // parallel( fold )
fold.test().parallel(), // parallel(...exp...)
'Always parallel:',
fold_par.test()
#}where {
test( foldFn ) =
if (1..1000).foldFn( operator+, 1 ) = 500501
then 'seq' else 'par';
}
{# 'Always sequential:', 'seq', 'seq', 'seq', 'seq', 'seq', 'seq', 'seq', 'Depending on the options:', 'seq', 'par', 'par', 'Always parallel:', 'par' #}
We assume that the sequential fold works in a single thread and uses
a single initial value 1
. The result is therefore the sum
of the integers in the range 1..1000
plus 1
,
i.e. 500501
. If, on the other hand, the fold is evaluated
in parallel, the initial value 1
is added many times and
the result is greater than 500501
.