8 Parallelization

8.1 Wafl and Parallelization

Parallelization is evaluation of program parts in multiple threads, in parallel manner. As each Wafl program represents an expression, the parallelization assumes the segmentation of the program in many sub-expressions, which evaluate in parallel.

Only the pure functional expressions can be parallelized. If a non-pure expression were to be parallelized, its result or its side-effects could differ in different evaluations.

The current Wafl implementation does not check if the subject of the parallelization is a pure functional expression or not. So developers need to take care of that.

Explicit Parallelization

Wafl supports explicit and implicit parallelization.

Explicit parallelization is based on the explicit use of parallel versions of some Wafl library functions. These functions are implemented to split the job in parts and to use many threads to complete the evaluation.

Explicit parallel functions include:

Each of the parallel versions of the functions has the same type and returns the same result as the sequential version of the functions, but evaluates in many parallel threads.

Implicit Parallelization

Implicit parallelization assumes that some parts of the programs will be implicitly (automatically) parallelized by interpreter. An important module of Wafl interpreter is a parallelization engine. Its job is to analyze the available data on the expressions and to decide if an expression is to be evaluated in a sequential or in a parallel way. Moreover, it decides on many of the parallelization options.

In a simplified way, we may say that the parallelization engine tries to evaluate the expected benefits of parallelization, taking into account:

Based on the evaluated benefits, the engine compares it to a threshold and decides to use the parallelization or not. The threshold may be configured.

By default, the implicit parallelization is disabled. The main reason for this is to prevent the errors in the cases of implicit parallelization of non-pure expressions.

In the future, after interpreter is updated to check on the expression pureness, the implicit parallelization may be enabled by default.

The following functions may be implicitly parallelizable:

If any specific instance of these functions will be parallelized or not, will be decided by the parallelization engine. Currently, the parallelization engine decides based on the size of the data collection to be processed, and on the number of currently active parallel jobs.

In the future the engine will use the expression complexity as well.

Implicit Parallelization Functions

Sometimes it is useful to have implicit parallelization, but to suggest to the parallelization engine that a part of the code should be evaluated in parallel, if possible. Such suggestion has the effect of lowering the threshold for the decision on parallelization.

To suggest that a program segment should be parallelized, there is a function parallel. It’s type is ('1 -> '1) and its semantics is almost like identity function \x:x. The only difference is that the parallelization threshold is lowered.

It can be applied to any expression, but has different effects if applied on functions.

In general case, when the bound expression is not a simple function reference, like in parallel(map(lst,fn)), the following semantics is applied:

So, the main difference is in whether the arguments will evaluate with parallelization turned on or not.

On the other side, if the argument is a function, like in parallel(map)(lst,fn), the parallelization suggestion is related strictly to the bound function. The main difference is that parallelization is not related to the arguments. We could describe the semantics as:

If the specified function is a library function, like map, the result is almost the same as using a specialized parallel map implementation instead. On the other side, if the specified function is a user defined function, then it is the same as if the user defined function is defined with parallel applied to its definition

For example, if we have:

f(x,y,z) = ...defexp...;

then we can suggest its parallelization in two ways:

Implicit Parallelization Command Line Options

The parallelization engine configuration can be set by command line options:

Explicit Sequential Functions

If implicit parallelization is used, it is possible that the parallelization engine decides to parallelize a part of the program that must not be parallelized (usually because it is not purely functional). To prevent this, there are explicit sequential versions of functions that prevent implicit parallelization.

There are sequential versions of each of the implicitly parallelizable functions:

8.2 Parallel Functions

Folding

Not every sequence folding can be parallelized. For example, consider the following expression:

(1..20).foldr(
    \x,l: if x%2=0 then x:l else l,
    []
)

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

The expression folds an integer sequence by evaluating a filtered list of even integers. (Of course, we can do than in a simpler way, using filter.) This is a folding example, which cannot be easily implemented using foldl with the same arguments, because foldl expects its first argument to be an integer and the second argument to be a list of integers.

In a general case, to be able to use both foldl and foldr, and with the same result, two important requirements must hold:

If we want the folding to evaluate in parallel, then there is even one additional requirement:

The following folding functions are provided to be used in such cases:

Function / Type and Description

fold

(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Sequence folding by an associative function:
    fold == foldl == foldr

fold_seq

(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Non-parallel folding of sequence by an associative function:
    fold_seq == foldl == foldr

fold_par

(Sequence['2]['1] * ('1 * '1 -> '1) * '1 -> '1)
Folding of sequence elements in parallel. [‘fn’ must be associative]: [‘zero’ must be ZERO, because it can be used many times]:
    fold_par(seq,fn,zero) == foldl(seq,fn,zero) == foldr(seq,fn,zero)

These functions assume that the folding is performed by a given binary operator of type ('1 * '1 -> '1), which is an associative operator, return the same result as foldl. Function fold_par requires the zero to be a neutral value for the operator, also, and then returns the same result.

However, the interpreter does not check if the operator is associative or not, and if the specified zero is really the zero for the operator, so please use these functions with care.

The main function here is fold. It may be implicitly parallelized. However, if the parallelization is required, then fold_par should be used, and if the parallelization is to be prevented, then fold_seq should be used.

In the following examples, the folding is performed using operator which returns the sum of the arguments, but outputs the arguments to the console:

(1..50).fold_seq(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n')

0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50  =
1275

(1..50).fold(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n')

0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50  =
1275

(1..50).fold_par(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n')

0 + 1 0 + 11 + 2 + 12 + 3 + 13 + 4 + 5 + 6 0 + 21 + 7 + 8 + 9 + 14 + 10 + 15 + 22 + 16 + 17 + 18 + 19 0 + 31 + 20 + 23 + 24 + 25 + 26 0 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 27 + 32 + 28 + 33 + 29 + 34 + 30 + 35 + 36 + 37 + 38 + 39 + 40 0 + 55 + 155 + 255 + 355 + 455  =
1275

The result is always the same, but in the case of the parallel evaluation, the order of the operations can be different. Additionally, after each thread folds its part of the sequence, the threads’ results are folded. The zero is used one time more than the number of threads.

Suggesting Parallelization

There is a special function parallel(exp). Its purpose is to suggest the interpreter that the given expression should be evaluated in parallel. It does not force the parallel evaluation, but lowers down the threshold for the parallelization of the given expression.

It has two versions:

In the following examples the same expression is evaluated three times. First example is as usual, with a default implicit parallelization:

(1..100).fold(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n')

0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100  =
5050

In the next example, using parallel(fold), we suggest that the function fold should evaluate in parallel:

(1..100).(parallel(fold))(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n')

0 + 1 0 + 11 + 2 0 + 31 + 3 + 4 + 5 0 + 41 0 + 21 + 42 + 6 + 43 + 44 + 45 + 46 + 47 + 48 0 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 0 + 61 + 62 + 63 + 64 + 22 + 65 + 59 + 66 + 67 + 68 + 69 + 70 + 7 + 60 + 32 + 23 + 24 0 + 71 + 25 + 26 + 27 + 28 + 29 + 30 + 49 + 12 + 50 + 33 0 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 0 + 81 + 8 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 9 + 10 + 13 + 72 + 14 + 73 + 15 + 74 + 16 + 75 + 17 + 76 + 18 + 19 + 20 + 77 + 78 + 79 + 80 0 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955  =
5050

And finally, by applying parallel to an expression we suggest that the complete expression should evaluate in parallel:

(1..100).fold(
    \x,y: x + y.echoTxt( 
        (if x=0 then x.asString() + ' ' else '')
        + "+ " + y.asString() + ' '),
    0
).echoTxt(' =\n').parallel()

0 + 1 0 + 11 + 2 0 + 61 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 0 + 91 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 0 + 21 0 + 81 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 0 + 51 0 + 71 + 72 + 73 + 74 + 52 + 75 + 76 + 77 + 78 + 79 + 80 0 + 41 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 0 + 31 + 92 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 0 + 55 + 155 + 255 + 355 + 455 + 555 + 655 + 755 + 855 + 955  =
5050

The implicit parallelization depends on many properties of the specific interpreter implementation and specific platform. Thus, it is not always possible to repeat the given examples with the same level of parallelization. Please try to change the list size to experience the different results.