4 Primitive Types

Wafl has four primitive types: Integer, Float, String and Bool. In this chapter we discuss the important elements of the primitive types.

This chapter is quite long and detailed, but its content is elementary, so we recommend a cursory reading. It is enough to get an idea of what is supported, and later you can use this chapter as a reference if needed.

4.1 Literals

Integer and float literal constants have the same syntax as in the programming languages C and C++.

Floating point literals must contain a decimal point and at least one digit in front of it.

Logical literal constants are true and false.

String literals are quoted using single or double quotation marks. It does not matter which type of quotation mark is used, but the same type must be used at the beginning and at the end of the string.

Special characters are specified by escape sequences, like in C/C++. The most important escape sequences are: single quotation mark (\'), double quotation mark (\"), backslash (\\), new line (\n), carriage return (\r), horizontal tab (\t), vertical tab (\v), form feed (\f) and backspace (\b). As in C/C++, the characters can be encoded with 3 octal digits: \nnn.

Examples of integer literals:

{#
    0, 42, -21
#}

{# 0, 42, -21 #}

Examples of float literals:

{#
    3.4, 0., 1.2e-3
#}

{# 3.4, 0, 0.0012 #}

Examples of bool literals:

{#
    true, false
#}

{# true, false #}

Examples of string literals:

{#
    'single quotation marks',
    "double quotation marks",
    "two\nlines",
    "octal codes A=\101 a=\141"
#}

{# 'single quotation marks', 'double quotation marks', 'two\012lines', 'octal codes A=A a=a' #}

4.2 Operators

4.2.1 Integer operators

Wafl has the usual arithmetic operators:

binary:
- addition (+);
- subtraction (-);
- multiplication (*);
- integer division (/);
- reminder of integer division (%);
- modulus (%%);
- power (**) and
unary:
- negation (-).

The division of integer values always produces an integer result.

The remainder and modulus operators are very similar. The difference is that the remainder of the integer division (%) returns positive values for positive dividends and negative values for negative dividends, while the modulus operator (%%) always returns a positive result:

{#
    17 / 10,
    17 % 10,
    17 %% 10,
    -17 / 10,
    -17 % 10,
    -17 %% 10,
    17 / -10,
    17 % -10,
    17 %% -10,
    -17 / -10,
    -17 % -10,
    -17 %% -10
#}

{# 1, 7, 7, -1, -7, 3, -1, 7, 7, 1, -7, 3 #}

Bit-level integer operators are syntactically and semantically equivalent to these operators in C/C++:

binary:
- bit-level conjunction (&);
- bit-level disjunction (|);
- bit-level left shift (<<);
- bit-level right shift (>>) and
unary:
- bit-level complement (~).

{#
    // '11110000' & '00111111' = '00110000' = 48
    240 & 63,   
    // '011' | '110' = '111' = 7
    3 | 6,      
    // bit-level complement
    ~5,
    // '11110' << 3 = '11110000' = 240
    30 << 3,   
    // '11111111' >> 3 = '11111' = 31
    255 >> 3
#}

{# 48, 7, -6, 240, 31 #}

The power operator a ** b returns a to the power of b. Any integer raised to a negative power will yield zero, except for one. One raised to any power will always yield 1:

{#
    2 ** 3,
    2 ** -3,
    1 ** 3,
    1 ** -3,
    -2 ** 3,
    -2 ** -3
#}

{# 8, 0, 1, 1, -8, 0 #}

Multiplication, division, remainder, modulus and bit-level conjunction have a higher precedence than addition, subtraction and bit-level disjunction. The power operator has the higher priority. Shift operators have the lower priority.

4.2.2 Float Operators

Wafl has the usual float operators:

binary:
- addition (+);
- subtraction (-);
- multiplication (*);
- division (/);
- power (**) and
unary:
- negation (-).

{#
    2.1 + 3.45678,
    3.0 - 1.2,
    3.14 * 2.17,
    17.0 / 10.,
    2.0 ** 3.0,
    2.0 ** 0.5
#}

{# 5.55678, 1.8, 6.8138, 1.7, 8, 1.414213562 #}

4.2.3 String operators

Wafl has a single binary string operator:

string concatenation (+).

"One" + "Two"

OneTwo

There are also indexing operators. They are discussed in the section on the sequence types.

4.2.4 Bool operators

The usual logical operators are supported in both C-like and SQL-like syntax:

binary:
- conjunction (&&, and);
- disjunction (||, or) and
unary:
- negation (!, not).

{#
    true or false,
    true || false,
    true and false,
    true && false,
    not true,
    !true
#}

{# true, true, false, false, false, false #}

4.2.5 Comparison Operators

The usual comparison operators are defined for the types Integer, Float and String:

two versions of the equality operator - C-like (==) and SQL-like (=);
two versions of the inequality operator - C-like (!=) and SQL-like (<>);
less-than (<);
less-than or equal (<=);
greater-than (>) and
greater-than or equal (>=).

{#
    1 < 2,
    1.2 <= 2.3,
    "abcd" > "ABCD",
    "abcd" >= "AB",
    21 * 2 = 42,
    21 == 42 / 2,
    17 != 18,
    -3.14 <> 3.14
#}

{# true, true, true, true, true, true, true, true #}

Wafl has no variables and no assignments. The operator ‘=’ has only two purposes: (1) to separate the definition name from the body and (2) as an equality operator. It can never be ambiguous, so there’s no reason to use the ‘==’ operator instead, but if someone likes it better, that’s fine.

4.3 Conversion Functions

Wafl is a strongly typed programming language and no implicit type conversions are allowed. Therefore, the Wafl core library contains the conversion functions:

asInt - from any other primitive type to Integer;
asFloat - from any other primitive type to Float;
asString - from any other type to String;
asChar - from any other primitive type to a single character String and
asBool - from any other primitive type to Bool.

There are several other conversion functions for specific type pairs.

It may seem strange to call these functions as..., but it is quite natural if we assume that we will mainly use them mainly with the dot syntax.

4.3.1 Conversion to Integer

The function asInt(x) converts every non-integer primitive value x into the Integer type:

asInt(x) converts a Float value x to the nearest integer, just like the synonymous function round;
asInt(x) converts a String value x, which is a valid integer literal, into the corresponding integer value;
- if x is a float value literal, only the digits before decimal point are used;
- if x is not a valid integer literal, asInt returns zero;
asInt(x) converts the Bool values true to the integer value 1 and false to 0.

For the conversion from Float to Integer, there are also:

round(x) - the same as asInt(x), converts a float value into the nearest Integer;
ceil(x) - returns the nearest integer that is not smaller and
floor(x) - returns the nearest integer that is not larger.

There are also functions for converting String values to Integer:

ascii(x) - converts a string value x to the ASCII code of the first character of the string;
- if x is an empty string, the result of the function is zero.

Examples of conversions from Float to Integer:

{#
    asInt(3.6),
    asInt(-3.6),
    round(3.6),
    round(-3.6),
    ceil(3.6),
    ceil(-3.6),
    floor(3.6),
    floor(-3.6)
#}

{# 4, -4, 4, -4, 4, -3, 3, -4 #}

Examples of conversions from String to Integer:

{#
    asInt('3'),
    asInt('3.8'),
    asInt('abc'),
    ascii('abc'),
    ascii('')
#}

{# 3, 3, 0, 97, 0 #}

Examples of conversions from Bool to Integer:

{#
    asInt(true),
    asInt(false)
#}

{# 1, 0 #}

4.3.2 Conversion to Float

The function asFloat(x) converts every primitive non-float value x into the type Float:

asFloat(x) converts the Integer value x to the corresponding Float value.
asFloat(x) converts the String value x, which represents a valid Float literal, into the corresponding Float value;
- if x is not a valid Float literal, asFloat returns zero;
asFloat(x) converts the Bool value true into the value 1.0 and the value false into 0.0.

{#
    asFloat(7),
    asFloat('6.2'),
    asFloat('abc'),
    asFloat(true),
    asFloat(false)
#}

{# 7, 6.2, 0, 1, 0 #}

4.3.3 Conversion to String

There are four functions and an operator for converting values of other data types into strings:

Function / Type and Description

asString

('1 -> String)
Converts a value to a string.

asChar

(PrimeNotString['1] -> String)
Converts a value to a character.

toString

(Float * Int -> String)
Converts a float value to a string with given precision.

asPreview

('1 -> String)
Converts a value to a shortened string.

The function asString(x) converts every non-string value x into String. It converts a value x into its string representation, according to the Wafl syntax.

There is a synonymous unary postfix operator $ with the same behavior.

“Any” means “any” - the function asString and the postfix operator $ convert any Wafl value of any type to its String representation.

{#
    asString(3),
    asString(3.14),
    asString(true),
    asString(false),
    asString('123'),
    asString( {# 1, 2.3, "abc", {# true, 's' #} #} ),
    asString([1,2,3,4,5,6,7,8,9,10]),

    3$,
    3.14$,
    true$,
    false$,
    '123'$,
    {# 1, 2.3, "abc", {# true, 's' #} #}$,
    [1,2,3,4,5,6,7,8,9,10]$
#}

{# '3', '3.14', 'true', 'false', '123', '{# 1, 2.3, \'abc\', {# true, \'s\' #} #}', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]', '3', '3.14', 'true', 'false', '123', '{# 1, 2.3, \'abc\', {# true, \'s\' #} #}', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]' #}

The function asPreview(x) is similar to asString, but returns a shorter string. For simple data, it behaves in the same way as asString. For larger structured data and longer strings, it extracts only a part of the complete string representation.

{#
    asPreview('01234567989'),
    asPreview( 
        '01234567890123456789012345678901234567890123456789'
        '01234567890123456789012345678901234567890123456789'
    ),
    asPreview([1,2,3,4,5,6,7,8,9,10]),
    asPreview(1..1000)
#}

{# '01234567989', '012345678901234567890123456789 ... 0123456789 (len=100)', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]', '[1, 2, 3, 4, 5, ..., 999, 1000] (len=1000)' #}

The function asChar(x) works with integers and logical values. It:

converts an Integer into a string consisting of a single character with a given ASCII code;
converts the logical value true into the string value "T" and the logical value false into the string value "F".

{#
    asChar(65),
    asChar(65.7),
    asChar(true),
    asChar(false)
#}

{# 'A', 'B', 'T', 'F' #}

The function toString(x,n) converts the float value x into string with n decimal places.

{#
    toString(1234.56789,0),
    toString(1234.56789,1),
    toString(1234.56789,2),
    toString(1234.56789,3)
#}

{# '1235', '1234.6', '1234.57', '1234.568' #}

4.3.4 Conversion to Bool

The function asBool(x) converts every primitive non-bool value x into the type Bool:

asBool(x) converts all non-zero Integer values to true and zero to false;
asBool(x) converts all non-zero Float values to true and zero to false;
asBool(x) converts string values "true" and "T" to true, and all other values to false.

{#
    asBool( 2 ),
    asBool( -3 ),
    asBool( 0 ),
    asBool( 2.1 ),
    asBool( -3.2 ),
    asBool( 0.0 ),
    asBool( "true" ),
    asBool( "True" ), // this is not same as "true"
    asBool( "T" ),
    asBool( "t" )     // this is not same se "T"
#}

{# true, true, false, true, true, false, true, false, true, false #}

4.4 Integer Functions

Wafl core library includes three integer functions:

abs(x) - absolute value;
sgn(x) - sign and
between(x,a,b) - check whether x is between a and b and
random(x) - random value.

4.4.1 Integer Function `abs`

The integer function abs(x) computes an absolute integer value of the given integer value x:

{#
    abs( 123 ),
    abs( -123 ),
    abs( -0 )
#}

{# 123, 123, 0 #}

4.4.2 Integer Function `sgn`

The integer function sgn(x) returns a sign of the number x. For positive values it returns 1, for negative values -1 and for zero it returns zero:

{#
    sgn( 20 ),
    sgn( -2 ),
    sgn( 0 )
#}

{# 1, -1, 0 #}

4.4.3 Integer Function `between`

The integer function between(x,a,b) checks whether the number x lies between the numbers a and b, including the limits. It is usually applied as x.between(a,b). This is equivalent to the expression x >= a and x <= b:

{#
    10 .between( 5, 20 ),
    10 .between( 10, 20 ),
    10 .between( 5, 10 ),
    10 .between( 5, 9 ),
    10 .between( 11, 20 )
#}

{# true, true, true, false, false #}

4.4.4 Function `random`

The integer function random(x) returns a random integer value in the range [0,x-1]. In the following example, we compute 20 random values in the range [0,4]:

{#
    random( 5 ), random( 5 ), random( 5 ), random( 5 ),
    random( 5 ), random( 5 ), random( 5 ), random( 5 ),
    random( 5 ), random( 5 ), random( 5 ), random( 5 ),
    random( 5 ), random( 5 ), random( 5 ), random( 5 ),
    random( 5 ), random( 5 ), random( 5 ), random( 5 )
#}

{# 3, 1, 3, 1, 3, 2, 1, 1, 4, 2, 3, 1, 0, 1, 3, 4, 1, 1, 3, 4 #}

Randomizing

By default the random number generator is reinitialized the first time it is used, using the current system timer as the seed. This is usually exactly what is expected and required.

However, sometimes it can be necessary to have the same random number sequence every time the program is executed (for debugging, benchmarking and some other cases). For such cases there is the clwafl command line option -nornd, which specifies a fixed predefined seed initialization.

Execute the previous program with:

clwafl -nornd program.wafl

and it will always return the same result.

If a program uses a parallel evaluation, the order of the randomly generated numbers is not guaranteed. If the -nornd option is used, the order of the numbers generated will be the same but its use by different threads will not give the same results each time it is run.

4.5 Float Functions

The Wafl core library contains the following float functions:

abs(x) - absolute value;
sgn(x) - sign;
between(x,a,b) - check whether x is between a and b and
roundTo(x,y) - rounding;
exp(x) - e to the power of x;
ln(x) - natural logarithm;
log(x) - logarithm to the base 10;
log2(x) - logarithm to the base 2;
pow(x,y) - x to the power of y;
sqrt(x) - square root;
sin(x) - sine;
cos(x) - cosine;
tan(x) - tangent;
asin(x) - arc sine;
acos(x) - arc cosine;
atan(x) - arc tangent and
atan2(y,x) - arc tangent of y/x (works for x=0).

The following conversion functions have already been presented in the previous sections:

round(x) - converts a float value to the nearest Integer value;
ceil(x) - returns the nearest Integer value that is not smaller and
floor(x) - returns the nearest Integer value that is not larger.

4.5.1 Float Function `abs`

The float function abs(x) returns the absolute value of a given float value x.

{#
    abs( 123.456 ),
    abs( -123.456 ),
    abs( -0.0 )
#}

{# 123.456, 123.456, 0 #}

4.5.2 Float Function `sgn`

The float function sgn(x) returns the sign of the number x. For positive values it returns 1.0, for negative values -1.0 and for zero it returns zero:

{#
    sgn( 20.3 ),
    sgn( -2.4 ),
    sgn( 0.0 )
#}

{# 1, -1, 0 #}

4.5.3 Float Function `between`

The float function between(x,a,b) checks whether the number x lies between the numbers a and b, including the limit values. It is usually applied as x.between(a,b). This is equivalent to the expression x >= a and x <= b:

{#
    10.0 .between( 9.5, 10.5 ),
    10.0 .between( 10.0, 10.5 ),
    10.0 .between( 9.5, 10.0 ),
    10.0 .between( 10.1, 10.5 ),
    10.0 .between( 9.5, 9.9 )
#}

{# true, true, true, false, false #}

4.5.4 Float Function `roundTo`

The float function roundTo(x,y) rounds the float value x. The given float value y defines a least significant digit.

{#
    roundTo( 1234.56789, 0.001 ),
    roundTo( 1234.56789, 0.01 ),
    roundTo( 1234.56789, 0.1 ),
    roundTo( 1234.56789, 1. ),
    roundTo( 1234.56789, 10. ),
    roundTo( 1234.56789, 100. ),
    roundTo( 1234.56789, 1000. ),
    roundTo( 1234.56789, 10000. )
#}

{# 1234.568, 1234.57, 1234.6, 1235, 1230, 1200, 1000, 0 #}

4.5.5 Function `exp`

{#
    exp( -10.0 ),
    exp( 0.0 ),
    exp( 1.0 ),
    exp( 10.0 )
#}

{# 4.539992976e-05, 1, 2.718281828, 22026.46579 #}

4.5.6 Function `ln`

The float function ln(x) returns the natural logarithm log_e x. It is defined for positive float values.

{#
    ln( 0.1 ),
    ln( 1.0 ),
    ln( 2.7182818284590452353602874),
    ln( 100. ),
    ln( 1000. )
#}

{# -2.302585093, 0, 1, 4.605170186, 6.907755279 #}

4.5.7 Function `log`

The float function log(x) returns the logarithm log₁₀ x. It is defined for positive float values.

{#
    log( 0.001 ),
    log( 0.01 ),
    log( 0.1 ),
    log( 1.0 ),
    log( 10. ),
    log( 100. ),
    log( 1000. )
#}

{# -3, -2, -1, 0, 1, 2, 3 #}

4.5.8 Function `log2`

The float function log2(x) returns the logarithm log₂ x. It is defined for positive float values.

{#
    log2( 0.001 ),
    log2( 0.0078125 ),
    log2( 0.25 ),
    log2( 0.5 ),
    log2( 1.0 ),
    log2( 2. ),
    log2( 4. ),
    log2( 128. ),
    log2( 1000. )
#}

{# -9.965784285, -7, -2, -1, 0, 1, 2, 7, 9.965784285 #}

4.5.9 Function `pow`

The float function pow(x,y) returns x^y - x to the power of y.

It is defined for positive x and any y. Negative x is only permitted if y is an integer. Zero x is only permitted for positive y.

{#
    pow( 2., 3. ),
    pow( 2., -3. ),
    pow( 2.5, -3.7 ),
    pow( -2., 3. ),
    pow( 0., 3.2 )
#}

{# 8, 0.125, 0.03369938443, -8, 0 #}

4.5.10 Function `sqrt`

The float function sqrt(x) returns the square root of x. It is defined for non-negative float values x.

{#
    sqrt(1.),
    sqrt(4.),
    sqrt(9.),
    sqrt(16.),
    sqrt(3433.32)
#}

{# 1, 2, 3, 4, 58.59453899 #}

4.5.11 Trigonometric functions

The following trigonometric functions are available:

sin(x) - sine;
cos(x) - cosine;
tan(x) - tangent;
asin(x) - arc sine;
acos(x) - arc cosine;
atan(x) - arc tangent and
atan2(y,x) - arc tangent of y/x (works for x=0).

The angles are measured in radians. In addition, atan2(x,y) maps a pair of float values to the corresponding angle. If y is not zero, atan2(x,y) = atan(x/y), but atan2 is also defined for y=0.

{#
    sin(3.14/2.0) * cos(3.14/2.0),
    tan(3.14/2.0),
    asin(0.5) + acos(0.5),
    atan(0.5),
    atan2(1.0,2.0),
    atan2(1.0,0.0)
#}

{# 0.0007963264582, 1255.765592, 1.570796327, 0.463647609, 0.463647609, 1.570796327 #}

4.6 String Functions

In this section we introduce the string functions.

The conversion functions (asChar, asString, ascii and toString) are presented in the previous sections.

The Wafl String type works with both single-byte strings and UTF-8 encoded multi-byte strings. However, some of the functions work with single-byte strings only. If a function does not work well with UTF-8 strings, this will be noted in this tutorial.

4.6.1 Basic String Functions

Function / Type and Description

strLen

(String -> Int)
Returns the length of the character string.

length

(Indexable['1]['2]['3] -> Int)
Returns the size of the collection.

size

(Indexable['1]['2]['3] -> Int)
Returns the size of the collection.

strCat

(String * String -> String)
String concatenation. The same as string addition.

isNull

(String -> Bool)
Checks whether a string is a database NULL value.

ifNull

(String * String -> String)
Replaces null with the given value:
ifNull(x,c) = if isNull(x) then c else x

`strLen`

The function strLen(x) returns the length of the string x.

It is important to understand that string x can contain any characters and that characters with the ASCII code zero are not treated as string terminals. Therefore, the String type can work not only with character strings, but also with byte strings.

There are two more general synonyms length and size.

{#
    strLen( "abc" ),
    strLen( "abc\0abc" ),
    length( "abc\0abc" ),
    size( "abc\0abc" )
#}

{# 3, 7, 7, 7 #}

In the case of UTF-8 strings, strLen, length and size return the size in bytes. To get a real UTF-8 string length in UTF-8 code-points, please use utfLen.

`strCat`

The strCat(x,y) function computes the concatenation of two given strings. It is equivalent to the string operator +.

{#
    "abc" + "def",
    strCat( "abc", "def" )
#}

{# 'abcdef', 'abcdef' #}

`isNull`, `ifNull`

Due to the databases, the String type supports the special undefined value NULL. The isNull(s) function checks whether the string s is NULL. The function ifNull(s,x) returns s if s is not NULL, but x if s is NULL.

ifNull(s,x) == if isNull(s) then x else s

{#
    $-1,    //  This returns null string
    isNull('a'),
    isNull($-1),
    ifNull("abc","xyz"),
    ifNull($-1,"xyz")
#}

{# 'NULL', false, true, 'abc', 'xyz' #}

In the previous example, we used the expression $-1 to generate NULL strings. The prefix operator $ will be introduced later.

4.6.2 String Extraction Functions

String extraction functions extract a part of the given string and return it. Wafl core library contains the following string extraction functions:

Function / Type and Description

sub

(SequenceStr['2]['1] * Int * Int -> SequenceStr['2]['1])
Extracts the subsequence from given 0-based position and given length:
sub(seq,pos,len)

subStr

(String * Int * Int -> String)
Returns a substring from given position (from 0) and with given length. [Deprecated. Use ‘sub’.]

strLeft

(String * Int -> String)
Returns first N characters of the string. If N is negative, returns all but last -N elements.

strRight

(String * Int -> String)
Returns last N characters of the string. If N is negative, returns all but first -N elements.

strLTrim

(String -> String)
Trims all spaces from left side.

strRTrim

(String -> String)
Trims all spaces from right side.

strTrim

(String -> String)
Trims all spaces from both sides.

`sub` and `subStr`

The function sub(s,p,n) returns a substring of character string s that starts at the zero based position p with the length n.

In the Wafl core library there is subStr, which is a synonym for sub. In the current version both functions are supported, but it is possible that only sub remains in future versions.

{#
    subStr( "abcdefgh", 0, 3 ),
    sub( "abcdefgh", 0, 3 ),
    sub( "abcdefgh", 2, 3 ),
    sub( "abcdefgh", -2, 5 ),
    sub( "abcdefgh", 5, 10 ),
    sub( "abcdefgh", 5, -2 )
#}

{# 'abc', 'abc', 'cde', '', 'fgh', '' #}

Special cases:

if a negative position is specified, an empty string is returned (example sub( "abcdefgh", -2, 5 ));
if a greater length is specified than available, a shorter string is returned (example sub( "abcdefgh", 5, 10 ))
if a negative length is specified, an empty string is returned (example sub( "abcdefgh", 5, -2 )).

In the case of UTF-8 strings, sub and subStr can return invalid strings. These functions treat strings as if they only consist of single byte characters. If a substring starts or ends in the middle of a multi-byte UTF-8 code-point, the result is not a valid UTF-8 string. To obtain a valid UTF-8 substring whose positions are specified in UTF-8 code-points, please use utfSub.

`strLeft` and `strRight`

The strLeft(s,n) function returns a substring containing the first n characters of the string s:

for a positive n, that is less than strLen(s), it is the same as sub(s,0,n);
for a negative n it is the same as strLeft(s,strLen(s)+n)
- we can read it as “all but the last -n characters”;

The strRight(s,n) function returns a substring containing the last n characters of the string s:

for a positive n, that is less than strLen(s), it is the same as sub(s,strLen(s)-n,n);
for a negative n it is the same as strRight(s,strLen(s)+n)
- we can read it as “all but the first -n characters”.

{#
    strLeft( "abcdefgh", 3 ),    //  first 3 characters
    strLeft( "abcdefgh", 10 ),   //  whole string
    strLeft( "abcdefgh", -5 ),   //  all but last 5 characters
    strLeft( "abcdefgh", -10 ),  //  empty string
    strRight( "abcdefgh", 3 ),   //  last 3 characters
    strRight( "abcdefgh", 10 ),  //  whole string
    strRight( "abcdefgh", -5 ),  //  all but first 5 characters
    strRight( "abcdefgh", -10 )  //  empty string
#}

{# 'abc', 'abcdefgh', 'abc', '', 'fgh', 'abcdefgh', 'fgh', '' #}

In the case of UTF-8 strings, strLeft and strRight can return invalid strings. These functions treat strings as if they only consist of single byte characters. If a substring starts or ends in the middle of a multi-byte UTF-8 code-point, the result is not a valid UTF-8 string. To obtain a valid UTF-8 substring whose positions are specified in UTF-8 code-points, please use utfLeft and utfRight.

`strLTrim`, `strRTrim` and `strTrim`

The function strLTrim(s) returns the largest substring of s that does not contain any leading, non-visible characters.

The function strRTrim(s) returns the largest substring of s that does not contain any trailing, non-visible characters.

The function strTrim(s) returns the largest substring of s that contains neither leading nor trailing, non-visible characters.

{#
    strLTrim( "\0 \t \n   abcd \b \003 \0 \r " ),
    strRTrim( "\0 \t \n   abcd \b \003 \0 \r " ),
    strTrim( "\0 \t \n   abcd \b \003 \0 \r " )
#}

{# 'abcd \010 \003 \000 \015 ', '\000 \011 \012   abcd', 'abcd' #}

4.6.3 String Index and Slice Operators

The index operator s[i] is equivalent to subStr(s, i %% strLen(s) , 1). This means that indexing beyond the length is possible. The index operator s[i] is similar, but not equivalent to subStr(s,i,1). They are only equivalent if the following applies: 0 <= i < strLen(s)

{# 
  s[-4], s[-3], s[-2], s[-1],
  s[0], s[1], s[2], s[3], 
  s[4], s[6], s[7], s[8]
#}
where {
  s = "abcd";
}

{# 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'c', 'd', 'a' #}

With UTF-8 strings, the indexing operator can return an invalid string. It treats strings as if they only consist of single byte characters. If the index points to a UTF-8 multi-byte code-point element, the result is not a valid UTF-8 string. To get a valid UTF-8 code-point whose position is specified in UTF-8 code-points, please use utfAt.

The slice operator uses a similar syntax to the index operator, but behaves like subStr, strLeft and strRight. If 0 < n < m <= strLen(s), then:

s[:n] is the same as strLeft(n), it extracts the first n characters;
s[n:] is the same as strRight(-n)), it extracts all but first n characters and
s[n:m] is the same as strRight(strLeft(s,m),-n), and the same as subStr(s,n,m-n).

If the index n is negative or greater than strLen(s), then n %% strLen(s) is used. The same applies to m.

{# 
    s[:6],
    s[:-2],
    s[2:],
    s[-6:],
    s[2:6], 
    s[2:-2], 
    s[-6:6],
    s[-6:-2]
#}
where {
  s = "abcdefgh";
}

{# 'abcdef', 'abcdef', 'cdefgh', 'cdefgh', 'cdef', 'cdef', 'cdef', 'cdef' #}

It is often easier to use the slice operator than extraction functions, but basically they do the same thing.

In case of UTF-8 strings, slice operators may return invalid strings. These operators treat strings as having single byte characters only. If a slice begins or ends in the middle of a multi-byte UTF-8 code-point, the result will not be a valid UTF-8 string. To get a valid UTF-8 slice, with positions denoted in UTF-8 code-points, please use utfSlice.

In the case of UTF-8 strings, slice operators can return invalid strings. They treat strings as if they only consist of single byte characters. If a slice starts or ends in the middle of a multi-byte UTF-8 code-point, the result is not a valid UTF-8 string. To obtain a valid UTF-8 slice whose positions are specified in UTF-8 code-points, please use utfSlice.

4.6.4 String Search Functions

The Wafl core library contains the following string search functions:

Function / Type and Description

strPos

(String * String -> Int)
Finds first position of a substring in the string, or -1 if not found.

strPosI

(String * String -> Int)
Same as strPos, but ignores upper and lower case.

strNextPos

(String * String * Int -> Int)
Finds next position of a substring in the string, after given pos.

strNextPosI

(String * String * Int -> Int)
Same as strNextPos, but ignores upper and lower case.

strLastPos

(String * String -> Int)
Finds last position of a substring in the string, or -1 if not found.

strLastPosI

(String * String -> Int)
Same as strLastPos, but ignores upper and lower case.

strNextLastPos

(String * String * Int -> Int)
Finds next last position of a substring in the string, before given pos.

strNextLastPosI

(String * String * Int -> Int)
Same as strNextLastPosI, but ignores upper and lower case.

strBeg

(String * String -> Bool)
Checks whether the 2nd string is at the beginning of the 1st.

strEnd

(String * String -> Bool)
Checks whether the 2nd string is at the end of the 1st.

All search functions return the start position of the second specified string in the first specified string if it is found, and -1 if it is not found.

Functions whose names end with ‘I’ are case-insensitive: strPosI, strNextPosI, strLastPosI, and strNextLastPosI.

In case-insensitive searches, both strings are first converted to upper case. This can be inefficient for larger strings.

`strPos`, `strPosI`, `strNextPos` and `strNextPosI`

The function strPos(s,p) returns the position of the first occurrence of the string p in the string s.

The function strPosI(s,p) returns the position of the first occurrence of the string p in the string s, ignoring upper and lower case.

The function strNextPos(s,p,i) returns the position of the first occurrence of the string p in the string s after position i.

The function strNextPosI(s,p,i) returns the position of the first occurrence of the string p in the string s after position i, ignoring upper and lower case.

{# 
    strPos( s, n ),     //  not found
    strPos( s, x ),     
    strNextPos( s, x, 0 ),
    strNextPos( s, x, 3 ),
    strNextPos( s, x, 6 ),
    strNextPos( s, x, 9 )
#}
where {
    s = "abcabcabcab";
    x = "ab";
    n = "xxx";
};

{# -1, 0, 3, 6, 9, -1 #}

`strLastPos`, `strLastPosI`, `strNextLastPos` and `strNextLastPosI`

The function strLastPos(s,p) returns the position of the last occurrence of the string p in the string s.

The function strPosI(s,p) returns the position of the last occurrence of the string p in the string s, ignoring upper and lower case.

The function strNextPos(s,p,i) returns the position of the last occurrence of the string p in the string s before the position i.

The function strNextPosI(s,p,i) returns the position of the last occurrence of the string p in the string s before the position i, ignoring upper and lower case.

Please note that the case-insensitive functions strLastPosI and strNextLastPosI do not work well for UTF-8 multibyte characters.

{# 
    strLastPos( s, n ),     //  not existing
    strLastPos( s, x ),     
    strNextLastPos( s, x, 9 ),
    strNextLastPos( s, x, 6 ),
    strNextLastPos( s, x, 3 ),
    strNextLastPos( s, x, 0 )
#}
where {
    s = "abcabcabcab";
    x = "ab";
    n = "xxx";
};

{# -1, 9, 6, 3, 0, -1 #}

`strBeg` and `strEnd`

The functions strBeg and strEnd check whether the first string has the second string at the beginning or at the end:

{#
    strBeg( "abcdef", "ab" ),
    strBeg( "abcdef", "cd" ),
    strBeg( "abcdef", "ef" ),
    strEnd( "abcdef", "ab" ),
    strEnd( "abcdef", "cd" ),
    strEnd( "abcdef", "ef" )
#}

{# true, false, false, false, false, true #}

4.6.5 Substring Counting Functions

The Wafl core library contains the following fuctions for counting substrings:

Function / Type and Description

strCountSub

(String * String -> Int)
Count occurrences of substring in the given string:
strCountSub('aaaaaA','aa') == 4

strCountSubI

(String * String -> Int)
Same as strCountSub, but ignores upper and lower case:
strCountSub('aaaaaA','aa') == 5

strCountSubDis

(String * String -> Int)
Count disjunct occurrences of substring in the given string:
strCountSub('aaaaaA','aa') == 2

strCountSubDisI

(String * String -> Int)
Same as strCountSubDis, but ignores upper and lower case:
strCountSub('aaaaaA','aa') == 3

All counting functions return the number of occurences of the given substring in the given string. The functions whose names end with ‘I’ ignore the upper and lower case. The functions with ‘Dis’ in the names count only the disjunctive substrings.

{# 
    strCountSub('aaaaaA','aa'),
    strCountSubI('aaaaaA','aa'),
    strCountSubDis('aaaaaA','aa'),
    strCountSubDisI('aaaaaA','aa')
#}

{# 4, 5, 2, 3 #}

The functions strCountSubI and strCountSubDisI first convert both strings to upper case letters. They can be inefficient with larger strings.

4.6.6 String Replace Functions

The Wafl core library contains the following functions for replacing the parts of the strings with another string:

Function / Type and Description

strReplace

(String * String * String * Int -> String)
Replaces Nth occurrence of substring with given string:
strReplace('ababa','b','c',2) == 'abaca'

strReplaceI

(String * String * String * Int -> String)
Same as strReplace, but ignores upper and lower case.

strReplaceAll

(String * String * String -> String)
Replaces all occurrences of substring with given string.

strReplaceAllI

(String * String * String -> String)
Same as strReplaceAll, but ignores upper and lower case.

Each of the strReplace* functions evaluates a new string and does not change any of the given strings.

The function strReplace(s,p,x,i) returns a copy of string s in which the i.th occurrence of the substring p is replaced by x. The string s remains unchanged.

The function strReplaceI(s,p,x,i) is similar to the function strReplace, but ignores upper and lower case when searching for p.

The function strReplaceAll(s,p,x) returns a copy of the string s in which all occurrences of substring p are replaced by x. The string s remains unchanged.

The function strReplaceAllI is similar to the function strReplaceAll, but ignores upper and lower case when searching for p.

{#
    strReplace( s, 'a', '@', 2 ),
    strReplaceI( s, 'a', '@', 2 ),
    strReplaceAll( s, 'a', '@' ),
    strReplaceAllI( s, 'a', '@' )
#}
where {
    s = "abABabAB";
}

{# 'abAB@bAB', 'ab@BabAB', '@bAB@bAB', '@b@B@b@B' #}

Please note that the case-insensitive functions strReplaceI and strReplaceAllI can be inefficient with larger strings.

4.6.7 Functions `strSplit...` and `strJoin`

Here we have a list of strings. The list is one of the most important concepts of functional programming languages, including Wafl. We will discuss lists in detail in the following chapter.

The function strSplit(s,p) returns a list of all substrings of the string s that are separated by the substring p.

The function strJoin(lst,p) concatenates all elements of the list lst and inserts the separator p between them.

Function / Type and Description

strSplit

(String * String -> List[String])
Splits a string to a list of string, by extracting the given separator.

strSplitTrim

(String * String -> List[String])
Splits a string to a list of string, by extracting the given separator. All spaces are trimmed from each segment from left and right side.

strSplitLines

(String -> List[String])
Splits a string to a list of string, by extracting new-line separator.

strSplitLinesTrim

(String -> List[String])
Splits a string to a list of string, by extracting new-line separator. All spaces are trimmed from each segment from left and right side.

strJoin

(Sequence['1][String] * String -> String)
Joins (concatenates) a sequence of strings, adding the given separator.

strSplit( 'a,bb,c,dd,e', ',' )

['a', 'bb', 'c', 'dd', 'e']

strJoin( ['a','b','c','d','e','f','g','h'], ';' )

a;b;c;d;e;f;g;h

strJoin( strSplit( 'a,bb,c,dd,e', ','), ';' )

a;bb;c;dd;e

The function strSplitTrim is similar to strSplit, but it detects and removes all empty spaces before and after the separators. It is functionally equivalent, but more efficient than the mapping of strTrim after the split:

s.strSplit(p).map(strTrim) == s.strSplitTrim(p)

The function strSplitLines(s) is similar to strSplit(s,'\n'), but it detects and removes both LF (Linux, ‘\n’) and CRLF (Windows, ‘\r\n’) new line sequences:

{#
    strSplit( '\nabc\ndef\nghi\n', '\n' ),
    strSplit( '\r\nabc\r\ndef\r\nghi\r\n', '\n' ),
    strSplitLines( '\nabc\ndef\nghi\n' ),
    strSplitLines( '\r\nabc\r\ndef\r\nghi\r\n' )
#}

{# ['', 'abc', 'def', 'ghi', ''], ['\015', 'abc\015', 'def\015', 'ghi\015', ''], ['', 'abc', 'def', 'ghi', ''], ['', 'abc', 'def', 'ghi', ''] #}

The function strSplitLinesTrim is similar to strSplitTrim and strSplitLines. It is functionally equivalent, but more efficient than mapping strTrim after using strSplitLines:

s.strSplitLines().map(strTrim) == s.strSplitLinesTrim()

The function strChars converts a string into a list of characters.

{#
    strChars( 'abc\ndef' )
#}

{# ['a', 'b', 'c', '\012', 'd', 'e', 'f'] #}

In the case of UTF-8 strings, strChars cuts a string into bytes. To get a valid UTF-8 code-point list, use utfChars instead.

4.6.8 Encoding functions

Sometimes we need to encode a string in a format that follows a specific syntax. Wafl contains four string encoding functions. All these functions have the common type: (String -> String).

The function strEncodeHtml(s) returns an encoded string that can be inserted into HTML. All special characters are replaced by corresponding HTML escape sequences:

strEncodeHtml( "abc&<>def" )

abc&amp;&lt;&gt;def

The function strEncodeSql(s) returns an encoded string ready for use in SQL string literals. All special characters are replaced by SQL escape sequences:

strEncodeSql( "abc'quotes'abc" )

abc''quotes''abc

The function strEncodeUri(s) returns an encoded string according to the rules of URI syntax:

strEncodeUri( "a + b = c" )

a%20%2B%20b%20%3D%20c

The function strEncodeWafl(s) returns an encoded string according to the Wafl syntax:

strEncodeWafl( "a\n \0 \'\"..." )

a\012 \000 \'\"...

4.6.9 UTF-8 Functions

To handle UTF-8 strings correctly, please only use the functions that handle strings as sequences of UTF-8 multibyte code-points.

Most of the specific UTF-8 behavior is covered by the following functions. Please note, however, that two important functionalities are not yet supported by the Wafl library:

string ordering handles strings as single-byte characters strings;
case insensitive functions do not work well with UTF-8 multibyte characters.

BOM Functions

Some applications require that files with UTF-8 content have a UTF-8 BOM (Byte Order Mark) sequence at the beginning of the file. The following functions enable the handling of UTF-8 BOM.

Please note that the use of UTF-8 BOM is not recommended, as the byte-order in UTF-8 format is irrelevant. It is based on individual bytes, not words.

Function / Type and Description

utfBom

( -> String)
Returns UTF-8 BOM sequence.

utfIsBom

(String -> Bool)
Checks whether a string content is UTF-8 BOM.

utfHasBom

(String -> Bool)
Checks whether a string begins with UTF-8 BOM.

utfAddBom

(String -> String)
Adds a UTF-8 BOM, if not already present.

utfTrimBom

(String -> String)
Trims leading BOM, if present.

The function utfBom() returns the UTF-8 BOM.

The function utfIsBom(s) checks whether the string content corresponds exactly to the UTF-8 BOM.

The function utfHasBom(s) checks whether the string begins with a UTF-8 BOM.

{#
    utfBom(),
    utfIsBom( utfBom() ),
    utfIsBom( 'abc' ),
    utfHasBom( utfBom() + 'abc' ),
    utfHasBom( 'abc' )
#}

{# '\357\273\277', true, false, true, false #}

The function utfAddBom(s) returns a string with a UTF-8 BOM appended to the beginning, if it is not already present.

The function utfTrimBom(s) returns a string without a leading UTF-8 BOM.

{#
    utfAddBom('abc'),
    utfHasBom( utfAddBom('abc') ),
    utfTrimBom( utfAddBom('abc') ),
    utfHasBom( utfTrimBom( utfAddBom('abc') ) )
#}

{# '\357\273\277abc', true, 'abc', false #}

UTF-8 Validity Functions

Function / Type and Description

utfIsValid

(String -> Bool)
Checks whether a string is a valid UTF-8 encoded string.

utfRepInvalid

(String * String -> String)
Replaces invalide code points with the given character.

The function utfIsValid(s) checks whether a string is a valid UTF-8 string. Please note that each single-byte characters string is a valid UTF-8 string.

The function utfRepInvalid(s,c) returns a strings in which all invalid UTF-8 code-points are replaced by the character c.

{#
    utfIsValid( sub('abc€def',4,4) ),
    utfRepInvalid( sub('abc€def',0,4), '@' ),  //  only the first byte of a MB 
    utfRepInvalid( sub('abc€def',4,4), '@' )  //  only the second byte of a MB
#}

{# false, 'abc@', '@@de' #}

`utfLen`

Function / Type and Description

utfLen

(String -> Int)
Returns UTF-8 length, as a number of complete code points.

The function utfLen(s) returns the string length by counting the complete UTF-8 code-points. The string length in code-points is always less than or equal to the length in bytes (strLen, length or size).

{#
    strLen( 'abc€def' ),
    utfLen( 'abc€def' )
#}

{# 9, 7 #}

`utfAt`

Function / Type and Description

utfAt

(String * Int -> String)
Returns a code point at given position, indexed by codepoints.

The function utfAt(s,i) is similar to the indexing operator s[i], but uses indices based on code-points. It returns the i-th UTF-8 code-point of the string s.

For more details, see indexing operator.

{#
    s[0], s[1], s[2], s[3],
    s.utfAt(0), s.utfAt(1), s.utfAt(2), s.utfAt(3)
#}
where {
  s = "abАБабAB";
}

{# 'a', 'b', '\320', '\220', 'a', 'b', '\320\220', '\320\221' #}

{#
    s[-1], s[-2], s[-3], s[-4],
    s.utfAt(-1), s.utfAt(-2), s.utfAt(-3), s.utfAt(-4)
#}
where {
  s = "abАБабAB";
}

{# 'B', 'A', '\261', '\320', 'B', 'A', '\320\261', '\320\260' #}

`utfSub`, `utfSlice`, `utfLeft`, `utfRight`

Function / Type and Description

utfSub

(String * Int * Int -> String)
Returns a substring from given position (from 0) and with given length, indexing complete UTF-8 code points instead of characters.

utfSlice

(String * Int * Int -> String)
Returns a substring between two given positions, indexing complete UTF-8 code points instead of characters.

utfLeft

(String * Int -> String)
Returns first N UTF-8 code points of the string.

utfRight

(String * Int -> String)
Returns last N UTF-8 code points of the string.

The function utfSub(s,p,n) is similar to subStr(s,p,n), but uses code-point based indices. It returns a substring of the string s, beginning at the zero based position p with length n, where position and length are counted based on UTF-8 code-points instead of characters.

For more details, please read subStr.

{#
    subStr( "abcdАБВГабвгABCD", 0, 8 ),
    utfSub( "abcdАБВГабвгABCD", 0, 8 )
#}

{# 'abcd\320\220\320\221', 'abcd\320\220\320\221\320\222\320\223' #}

The function utfSlice(s,n,m) is similar to the string slice operator s[n:m], but uses code-point based indices. It returns a substring of the string s, starting at the zero based position n and ending before the position m, where positions n and m are counted based on UTF-8 code-points instead of characters.

For more details, see string slice operator.

{#
    "abcdАБВГабвгABCD" [2:6],
    utfSlice( "abcdАБВГабвгABCD", 2, 6 )
#}

{# 'cd\320\220', 'cd\320\220\320\221' #}

The function utfLeft(s,n) is similar to the string function strLeft, but uses code-point based indices. It returns a substring containing the first n UTF-8 code-points of the string s.

For more details please study strLeft.

{#
    strLeft( "abcdАБВГабвгABCD", 6 ),
    utfLeft( "abcdАБВГабвгABCD", 6 )
#}

{# 'abcd\320\220', 'abcd\320\220\320\221' #}

The function utfRight(s,n) is similar to string function strRight, but uses indices based on code-points. It returns a substring containing the last n UTF-8 code-points of the string s.

For more details, please see strRight.

{#
    strRight( "abcdАБВГабвгABCD", 6 ),
    strRight( "abcdАБВГабвгABCD", 6 )
#}

{# '\320\263ABCD', '\320\263ABCD' #}

Other UTF-8 Functions

Function / Type and Description

utfChars

(String -> List[String])
Splits a string to a list of UTF-8 code points.

utfReverse

(String -> String)
Reverses UTF-8 string.

The function utfChars(s) is similar to the string function strChars, but uses code-points instead of characters. It returns a list of all UTF-8 code-points of the string s.

For more details please study strChars.

{#
    strChars( "abАБабAB" ),
    utfChars( "abАБабAB" )
#}

{# ['a', 'b', '\320', '\220', '\320', '\221', '\320', '\260', '\320', '\261', 'A', 'B'], ['a', 'b', '\320\220', '\320\221', '\320\260', '\320\261', 'A', 'B'] #}

The function utfReverse(s) is similar to the string function strReverse, but uses code-points instead of characters. It returns a reversed string, taking care to preserve the UTF-8 code-points.

For more details, please read strReverse.

{#
    strReverse( "abАБабAB" ),
    utfReverse( "abАБабAB" )
#}

{# 'BA\261\320\260\320\221\320\220\320ba', 'BA\320\261\320\260\320\221\320\220ba' #}

4.6.10 Other String Functions

Here we discuss three other string functions:

Function / Type and Description

strLowerCase

(String -> String)
Converts all letters to lower case.

strUpperCase

(String -> String)
Converts all letters to upper case.

strReverse

(String -> String)
Reverses the string.

The function strLowerCase converts all letters in a string to lower case.

The function strUpperCase converts all letters in a string to upper case.

{#
    strLowerCase( 'aAbBcC' ),
    strUpperCase( 'aAbBcC' )
#}

{# 'aabbcc', 'AABBCC' #}

The function strReverse returns the reversed string.

{#
    strReverse( 'aAbBcC' )
#}

{# 'CcBbAa' #}

In the case of UTF-8 strings, strReverse can return invalid strings. This function treats strings as if they only consisted of single byte characters. To get a valid UTF-8 reversed string, please use utfReverse.