Table of Contents
Wafl has four primitive types: Integer
,
Float
, String
and Bool
. In this
chapter we discuss the important elements of the primitive types.
This chapter is quite long and detailed, but its content is elementary, so we suggest a cursory read. It is enough to have an idea of what is supported, and later use this chapter as a reference as needed.
Integer and float literal constants have the similar syntax as in programming languages C and C++.
Float literals must contain decimal point and at least one digit before it.
Logical literal constants are true
and
false
.
String literals are quoted by single or double quotes. Special
characters are specified by escape sequences, like in C/C++. Most
important escape sequences are: single quote (\'
), double
quote (\"
), backslash (\\
), new line
(\n
), carriage return (\r
), horizontal tab
(\t
), vertical tab (\v
), form feed
(\f
) and backspace (\b
). Like in C/C++,
characters may be encoded by 3 octal digits: \nnn
.
Examples of integer literals:
{#0, 42, -21
#}
{# 0, 42, -21 #}
Examples of float literals:
{#3.4, 0., 1.2e-3
#}
{# 3.4, 0, 0.0012 #}
Examples of bool literals:
{#
true, false #}
{# true, false #}
Examples of string literals:
{#'single quotes',
"double quotes",
"two\nlines",
"octal codes A=\101 a=\141"
#}
{# 'single quotes', 'double quotes', 'two\012lines', 'octal codes A=A a=a' #}
Wafl has usual arithmetical operators:
+
);-
);*
);/
);%
);%%
);**
) and-
).The division of integer values always computes an integer result.
While the integer division remainder operator (%
) returns
positive values for positive dividend and negative values for negative
dividends, the modulus operator (%%
) always returns a
positive result:
{#17 / 10,
17 % 10,
17 %% 10,
-17 / 10,
-17 % 10,
-17 %% 10,
17 / -10,
17 % -10,
17 %% -10,
-17 / -10,
-17 % -10,
-17 %% -10
#}
{# 1, 7, 7, -1, -7, 3, -1, 7, 7, 1, -7, 3 #}
Bit-level integer operators are syntactically and semantically equivalent to these operators in C/C++:
&
);|
);<<
);>>
) and~
).
{#// '11110000' & '00111111' = '00110000' = 48
240 & 63,
// '011' | '110' = '111' = 7
3 | 6,
// bit-level complement
~5,
// '11110' << 3 = '11110000' = 240
30 << 3,
// '11111111' >> 3 = '11111' = 31
255 >> 3
#}
{# 48, 7, -6, 240, 31 #}
The power operator a ** b
evaluates a
to
the power of b
. Any integer to the negative power will
evaluate zero, except for one. One to the power of any integer will
always evaluate 1.:
{#2 ** 3,
2 ** -3,
1 ** 3,
1 ** -3,
-2 ** 3,
-2 ** -3
#}
{# 8, 0, 1, 1, -8, 0 #}
Multiplication, division, remainder, modulus and bit-level conjunction have higher priority than addition, subtraction and bit-level disjunction. The power operator has the highest priority. Shift operators have the lowest priority.
Wafl has usual float operators:
+
);-
);*
);/
);**
) and-
).
{#2.1 + 3.45678,
3.0 - 1.2,
3.14 * 2.17,
17.0 / 10.,
2.0 ** 3.0,
2.0 ** 0.5
#}
{# 5.55678, 1.8, 6.8138, 1.7, 8, 1.414213562 #}
Wafl has single binary string operator:
"One" + "Two"
OneTwo
There are also indexing operators. They will be discussed with sequence types.
The usual logical operators are supported in both C-like and SQL-like syntax:
&&
, and
);||
, or
) and!
, not
).
{#
true or false,|| false,
true
true and false,&& false,
true
not true,!true
#}
{# true, true, false, false, false, false #}
The usual comparison operators are defined for Integer
,
Float
and String
types:
==
) and
SQL-like (=
);!=
) and
SQL-like (<>
);<
);<=
);>
) and>=
).
{#1 < 2,
1.2 <= 2.3,
"abcd" > "ABCD",
"abcd" >= "AB",
21 * 2 = 42,
21 == 42 / 2,
17 != 18,
-3.14 <> 3.14
#}
{# true, true, true, true, true, true, true, true #}
Wafl has no variables and no assignments. The operator
‘=
’ has only two roles: (1) to separate the definition name
from the body and (2) as equality operator. It can never be ambiguous,
so there’s no reason to use operator ‘==
’ instead, but if
someone likes it more, that’s fine.
Wafl is strongly typed programming language and no implicit type conversions are allowed. Thus, Wafl core library contains the conversion functions:
asInt
- from any other primitive type to
Integer
;asFloat
- from any other primitive type to
Float
;asString
- from any other type to
String
;asChar
- from any other primitive type to a single
character String
andasBool
- from any other primitive type to
Bool
.There are some other conversion functions available for specific pairs of types.
It may seem strange to call these functions as...
, but
it is quite natural if we expect to use them mainly with dot syntax.
Function asInt(x)
converts any non-integer primitive
value x
to Integer
type:
asInt(x)
converts a Float
value
x
to the closest integer, the same like the synonym
function round
;asInt(x)
converts String
value
x
,representing a valid integer literal, to appropriate
integer value;
x
represents a float value, only digits before
decimal point are used;x
is not a valid integer literal, asInt
returns zero;asInt(x)
converts Bool
values
true
to integer value 1
and false
to 0
.Additionally for conversion from Float
to
Integer
there are:
round(x)
- the same as asInt(x)
, converts
a float value to the closest Integer
;ceil(x)
- returns closest not smaller integer andfloor(x)
- returns closest not larger integer.For conversion of String
values to Integer
there is also:
ascii(x)
- converts a string value x
to
ASCII code of the first character of the string;
x
is an empty string, the function result is
zero.Examples of conversions from Float
to
Integer
:
{#asInt(3.6),
asInt(-3.6),
round(3.6),
round(-3.6),
ceil(3.6),
ceil(-3.6),
floor(3.6),
floor(-3.6)
#}
{# 4, -4, 4, -4, 4, -3, 3, -4 #}
Examples of conversions from String
to
Integer
:
{#asInt('3'),
asInt('3.8'),
asInt('abc'),
ascii('abc'),
ascii('')
#}
{# 3, 3, 0, 97, 0 #}
Examples of conversions from Bool
to
Integer
:
{#asInt(true),
asInt(false)
#}
{# 1, 0 #}
Function asFloat(x)
converts any non-float primitive
value x
to Float
type:
asFloat(x)
converts Integer
value
x
to appropriate Float
value.asFloat(x)
converts String
value
x
, representing a valid Float
literal, to
appropriate Float
value;
x
is not valid Float
literal,
asFloat
returns zero;asFloat(x)
converts Bool
value
true
to value 1.0
and value false
to 0.0
.
{#asFloat(7),
asFloat('6.2'),
asFloat('abc'),
asFloat(true),
asFloat(false)
#}
{# 7, 6.2, 0, 1, 0 #}
There are four functions for conversion of values of other data types to strings:
Function / Type and Description
asString
('1 -> String)
Converts a value to a string.
asChar
(PrimeNotString['1] -> String)
Converts a value
to a character.
toString
(Float * Int -> String)
Converts a value to a
string with given precision.
asPreview
('1 -> String)
Converts a value to a shortened
string.
Function asString(x)
converts any non-string value
x
to String
. It converts a value
x
to its string representation, according to the Wafl
syntax.
There is a synonymous postfix operator $
with the same
behavior.
“Any” means “any” - the function asString
and postfix
operator $
convert any Wafl value of
any type to its String
representation.
{#asString(3),
asString(3.14),
asString(true),
asString(false),
asString('123'),
asString( {# 1, 2.3, "abc", {# true, 's' #} #} ),
asString([1,2,3,4,5,6,7,8,9,10]),
3$,
3.14$,
true$,
false$,'123'$,
1, 2.3, "abc", {# true, 's' #} #}$,
{# [1,2,3,4,5,6,7,8,9,10]$
#}
{# '3', '3.14', 'true', 'false', '123', '{# 1, 2.3, \'abc\', {# true, \'s\' #} #}', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]', '3', '3.14', 'true', 'false', '123', '{# 1, 2.3, \'abc\', {# true, \'s\' #} #}', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]' #}
Function asPreview(x)
is similar to
asString
, but returns a shorter string. For simple
data it behaves the same as asString
. For larger structured
data and longer strings, it extracts just a part of the complete string
representation.
{#asPreview('01234567989'),
asPreview(
'01234567890123456789012345678901234567890123456789'
'01234567890123456789012345678901234567890123456789'
),asPreview([1,2,3,4,5,6,7,8,9,10]),
asPreview(1..1000)
#}
{# '01234567989', '012345678901234567890123456789 ... 0123456789 (len=100)', '[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]', '[1, 2, 3, 4, 5, ..., 999, 1000] (len=1000)' #}
Function asChar(x)
works on integers and logical values.
It:
Integer
to a string consisting of a single
character with a given ASCII code;true
to a string value
"T"
and logical value false
to a string value
"F"
.
{#asChar(65),
asChar(65.7),
asChar(true),
asChar(false)
#}
{# 'A', 'B', 'T', 'F' #}
Function toString(x,n)
converts float value
x
to string with n
digits after decimal
point.
{#toString(1234.56789,0),
toString(1234.56789,1),
toString(1234.56789,2),
toString(1234.56789,3)
#}
{# '1235', '1234.6', '1234.57', '1234.568' #}
Function asBool(x)
converts any non-bool primitive value
x
to Bool
type:
asBool(x)
converts all non-zero Integer
values to true
and zero to false
;asBool(x)
converts all non-zero Float
values to true
and zero to false
;asBool(x)
converts string values "true"
and "T"
to true
, and all other values to
false
.
{#asBool( 2 ),
asBool( -3 ),
asBool( 0 ),
asBool( 2.1 ),
asBool( -3.2 ),
asBool( 0.0 ),
asBool( "true" ),
asBool( "True" ), // this is not same as "true"
asBool( "T" ),
asBool( "t" ) // this is not same se "T"
#}
{# true, true, false, true, true, false, true, false, true, false #}
Wafl core library includes three integer functions:
abs(x)
- absolute value;sgn(x)
- sign andrandom(x)
- random value.abs
Integer function abs(x)
computes an absolute integer
value of the given integer value x
:
{#abs( 123 ),
abs( -123 ),
abs( -0 )
#}
{# 123, 123, 0 #}
sgn
Integer function sgn(x)
returns a sign of the
number x
. For positive values returns 1
, for
negative values returns -1
and for zero returns zero:
{#sgn( 20 ),
sgn( -2 ),
sgn( 0 )
#}
{# 1, -1, 0 #}
random
Integer function random(x)
computes a random integer
value in range [0,x-1]
. In the following example, we
compute 20 random values in range [0,4]
:
{#random( 5 ), random( 5 ), random( 5 ), random( 5 ),
random( 5 ), random( 5 ), random( 5 ), random( 5 ),
random( 5 ), random( 5 ), random( 5 ), random( 5 ),
random( 5 ), random( 5 ), random( 5 ), random( 5 ),
random( 5 ), random( 5 ), random( 5 ), random( 5 )
#}
{# 3, 1, 0, 4, 3, 3, 3, 2, 2, 4, 2, 2, 0, 2, 0, 2, 3, 3, 0, 2 #}
The default behavior is to reinitialize the random number generator seed on first usage, using current system timer. This is usually exactly what is expected and required by the programmer.
However, sometimes it can be required to have the same random numbers
sequence on each program run (for debugging, benchmarking and some other
cases). In such cases there is a clwafl
command line option
-nornd
, which sets a predefined seed initialization.
Executing the previous program using:
clwafl -nornd program.wafl
will always result in the same result.
If a program uses a parallel evaluation, then the random sequence will not be guaranteed. In fact, the sequence will be the same, but the sequence usage by different threads will not be the same on each run.
Wafl core library includes the following float functions:
abs(x)
- absolute value;sgn(x)
- sign;roundTo(x,y)
- rounding;exp(x)
- e to the power of
x
;ln(x)
- natural logarithm;log(x)
- base 10 logarithm;log2(x)
- base 2 logarithm;pow(x,y)
- x
to the power of
y
;sqrt(x)
- square root;sin(x)
- sine;cos(x)
- cosine;tan(x)
- tangent;asin(x)
- arc sine;acos(x)
- arc cosine;atan(x)
- arc tangent andatan2(y,x)
- arc tangent of
y
/x
(works for x
=0).The following conversion functions are already presented in the previous sections:
round(x)
- converts a float value to the closest
Integer
;ceil(x)
- returns closest not smaller integer andfloor(x)
- returns closest not larger integer.abs
Float function abs(x)
computes an absolute float value
of a given float value x
.
{#abs( 123.456 ),
abs( -123.456 ),
abs( -0.0 )
#}
{# 123.456, 123.456, 0 #}
sgn
Float function sgn(x)
returns a sign of the
number x
. For positive values returns 1.0
, for
negative values returns -1.0
and for zero returns zero:
{#sgn( 20.3 ),
sgn( -2.4 ),
sgn( 0.0 )
#}
{# 1, -1, 0 #}
roundTo
Float function roundTo(x,y)
rounds float value
x
. Given float value y
defines a lowest
significant digit.
{#roundTo( 1234.56789, 0.001 ),
roundTo( 1234.56789, 0.01 ),
roundTo( 1234.56789, 0.1 ),
roundTo( 1234.56789, 1. ),
roundTo( 1234.56789, 10. ),
roundTo( 1234.56789, 100. ),
roundTo( 1234.56789, 1000. ),
roundTo( 1234.56789, 10000. )
#}
{# 1234.568, 1234.57, 1234.6, 1235, 1230, 1200, 1000, 0 #}
exp
Exponential function exp(x)
computes
e
x
:
{#exp( -10.0 ),
exp( 0.0 ),
exp( 1.0 ),
exp( 10.0 )
#}
{# 4.539992976e-05, 1, 2.718281828, 22026.46579 #}
ln
Float function ln(x)
computes natural logarithm
log
e x
. It is defined for positive
float values.
{#ln( 0.1 ),
ln( 1.0 ),
ln( 2.7182818284590452353602874),
ln( 100. ),
ln( 1000. )
#}
{# -2.302585093, 0, 1, 4.605170186, 6.907755279 #}
log
Float function log(x)
computes logarithm
log
10 x
. It is defined for positive
float values.
{#log( 0.001 ),
log( 0.01 ),
log( 0.1 ),
log( 1.0 ),
log( 10. ),
log( 100. ),
log( 1000. )
#}
{# -3, -2, -1, 0, 1, 2, 3 #}
log2
Float function log2(x)
computes logarithm
log
2 x
. It is defined for positive
float values.
{#log2( 0.001 ),
log2( 0.0078125 ),
log2( 0.25 ),
log2( 0.5 ),
log2( 1.0 ),
log2( 2. ),
log2( 4. ),
log2( 128. ),
log2( 1000. )
#}
{# -9.965784285, -7, -2, -1, 0, 1, 2, 7, 9.965784285 #}
pow
Float function pow(x,y)
computes
x
y
- x
to the power of
y
.
It is defined for positive x
and any y
.
Negative x
is allowed only if y
is a whole
number. Zero x
is allowed only for positive
y
.
{#pow( 2., 3. ),
pow( 2., -3. ),
pow( 2.5, -3.7 ),
pow( -2., 3. ),
pow( 0., 3.2 )
#}
{# 8, 0.125, 0.03369938443, -8, 0 #}
sqrt
Float function sqrt(x)
computes square root of
x
. It is defined for non-negative float values
x
.
{#sqrt(1.),
sqrt(4.),
sqrt(9.),
sqrt(16.),
sqrt(3433.32)
#}
{# 1, 2, 3, 4, 58.59453899 #}
The following trigonometric functions are available:
sin(x)
- sine;cos(x)
- cosine;tan(x)
- tangent;asin(x)
- arc sine;acos(x)
- arc cosine;atan(x)
- arc tangent andatan2(y,x)
- arc tangent of
y
/x
(works for x
=0).Angles are measured in radians. Additionally, atan2(x,y)
maps a pair of float values to appropriate angle. If y
is
non-zero then atan2(x,y) = atan(x/y)
, but
atan2
is defined even for y=0
.
{#sin(3.14/2.0) * cos(3.14/2.0),
tan(3.14/2.0),
asin(0.5) + acos(0.5),
atan(0.5),
atan2(1.0,2.0),
atan2(1.0,0.0)
#}
{# 0.0007963264582, 1255.765592, 1.570796327, 0.463647609, 0.463647609, 1.570796327 #}
In this section we preset the string functions.
Conversion functions (asChar
, asString
,
ascii
and toString
) are presented in previous sections.
Wafl String type works with both single-byte strings and with multi-byte UTF-8 encoded strings. However, some of the functions work only with single-byte characters strings. If a function may not work well for UTF-8 strings, it is noted in this tutorial. Please take care.
Function / Type and Description
strLen
(String -> Int)
Get the string length.
length
(Indexable['1]['2]['3] -> Int)
Get the
collection size.
size
(Indexable['1]['2]['3] -> Int)
Get the
collection size.
strCat
(String * String -> String)
String
concatenation. Same as string addition.
isNull
(String -> Bool)
Check if a string represents a
database NULL value.
ifNull
(String * String -> String)
Replace null with
given value:
ifNull(x,c) = if isNull(x) then c else x
strLen
Function strLen(x)
computes the length of the string
x
.
It is important to understand that string x
may include
any characters, and that characters with ASCII code zero are not handled
as string terminals. Thus, String
type may work not only
with character strings, but also with byte strings.
There are two more general synonyms length
and
size
.
{#strLen( "abc" ),
strLen( "abc\0abc" ),
length( "abc\0abc" ),
size( "abc\0abc" )
#}
{# 3, 7, 7, 7 #}
In case of UTF-8 strings, strLen
, length
and size
return the size in bytes. To get a real
UTF-8 string length in UTF-8 code-points, please use
utfLen
.
strCat
Function strCat(x,y)
computes the concatenation of two
given strings. It is equivalent to string operator +
.
{#"abc" + "def",
strCat( "abc", "def" )
#}
{# 'abcdef', 'abcdef' #}
isNull
, ifNull
Because of the databases, String
type supports special
undefined value NULL
. Function
isNull(s)
checks if string s
is NULL. Function
ifNull(s,x)
will return s
if s
is
not NULL, but x
if s
is NULL.
ifNull(s,x) == if isNull(s) then x else s
{#-1, // This will return null string
$isNull('a'),
isNull($-1),
ifNull("abc","xyz"),
ifNull($-1,"xyz")
#}
{# 'NULL', false, true, 'abc', 'xyz' #}
In the previous example, we used expression $-1
to
create NULL
strings. Operator $
will be
presented later.
String extraction functions extract and return a part of the given string. Wafl core library includes the following string extraction functions:
Function / Type and Description
sub
(SequenceStr['2]['1] * Int * Int -> SequenceStr['2]['1])
Extracts the subsequence from given 0-based position and given
length:
sub(seq,pos,len)
subStr
(String * Int * Int -> String)
Returns a
substring from given position (from 0) and with given length.
strLeft
(String * Int -> String)
Returns first N
characters of the string. If N is negative, returns all but last -N
elements.
strRight
(String * Int -> String)
Returns last N
characters of the string. If N is negative, returns all but first -N
elements.
strLTrim
(String -> String)
Trims all spaces from left
side.
strRTrim
(String -> String)
Trims all spaces from right
side.
strTrim
(String -> String)
Trims all spaces from the
string.
sub
and subStr
Function sub(s,p,n)
returns a substring of string
s
, beginning at (zero based) position p
with
length n
.
In Wafl core library there is subStr
, which is a synonym
for sub
. In current version both functions are supported,
but it is possible that only sub
will remain.
{#subStr( "abcdefgh", 0, 3 ),
sub( "abcdefgh", 0, 3 ),
sub( "abcdefgh", 2, 3 ),
sub( "abcdefgh", -2, 5 ),
sub( "abcdefgh", 5, 10 ),
sub( "abcdefgh", 5, -2 )
#}
{# 'abc', 'abc', 'cde', '', 'fgh', '' #}
Special cases:
sub( "abcdefgh", -2, 5 )
);sub( "abcdefgh", 5, 10 )
)sub( "abcdefgh", 5, -2 )
).In case of UTF-8 strings, sub
and subStr
may return invalid strings. These functions treat strings as having
single byte characters only. If a substring begins or ends in the middle
of a multi-byte UTF-8 code-point, the result will not be a valid UTF-8
string. To get a valid UTF-8 substring, with positions denoted in UTF-8
code-points, please use utfSub
.
strLeft
and
strRight
Function strLeft(s,n)
returns a substring containing the
first n
characters of string s
:
n
, smaller than strLen(s)
, it
is the same as sub(s,0,n)
;n
it is the same as
strLeft(s,strLen(s)+n)
Function strRight(s,n)
returns a substring containing
the last n
characters of string s
:
n
, smaller than strLen(s)
, it
is the same as sub(s,strLen(s)-n,n)
;n
it is the same as
strRight(s,strLen(s)+n)
{#strLeft( "abcdefgh", 3 ), // first 3 characters
strLeft( "abcdefgh", 10 ), // whole string
strLeft( "abcdefgh", -5 ), // all but last 5 characters
strLeft( "abcdefgh", -10 ), // empty string
strRight( "abcdefgh", 3 ), // last 3 characters
strRight( "abcdefgh", 10 ), // whole string
strRight( "abcdefgh", -5 ), // all but first 5 characters
strRight( "abcdefgh", -10 ) // empty string
#}
{# 'abc', 'abcdefgh', 'abc', '', 'fgh', 'abcdefgh', 'fgh', '' #}
In case of UTF-8 strings, strLeft
and
strRight
may return invalid strings. These functions treat
strings as having single byte characters only. If a substring begins or
ends in the middle of a multi-byte UTF-8 code-point, the result will not
be a valid UTF-8 string. To get a valid UTF-8 substring, with positions
denoted in UTF-8 code-points, please use utfLeft
and
utfRight
.
strLTrim
,
strRTrim
and strTrim
Function strLTrim(s)
returns sub string of
s
not containing leading non-visible characters. Function
strRTrim(s)
returns sub string of s
not
containing trailing non-visible characters. Function
strTrim(s)
returns sub string of s
without
both leading and trailing non-visible characters.
{#strLTrim( "\0 \t \n abcd \b \003 \0 \r " ),
strRTrim( "\0 \t \n abcd \b \003 \0 \r " ),
strTrim( "\0 \t \n abcd \b \003 \0 \r " )
#}
{# 'abcd \010 \003 \000 \015 ', '\000 \011 \012 abcd', 'abcd' #}
Index operator s[i]
is equivalent to
subStr(s, i %% strLen(s) , 1)
. That means that indexing
beyond the length is possible.
Index operator s[i]
is similar, but not equivalent to
subStr(s,i,1)
. They are equivalent only if holds: 0 <=
i
< strLen(s)
{# [-4], s[-3], s[-2], s[-1],
s[0], s[1], s[2], s[3],
s[4], s[6], s[7], s[8]
s
#}where {
s = "abcd";
}
{# 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'a', 'c', 'd', 'a' #}
In case of UTF-8 strings, indexing operator may return an invalid
string. It treats strings as having single byte characters only. If
index points to a UTF-8 multi-byte code-point element, then result will
not be a valid UTF-8 code-point. To get a valid UTF-8 code-point, with
positions denoted in UTF-8 code-points, please use
utfAt
.
Slice operator uses syntax similar to index operator, but behaves
like subStr
, strLeft
and
strRight
. If 0 < n
< m
<= strLen(s)
, then:
s[:n]
is the same as strLeft(n)
, extracts
the first n
characters;s[n:]
is the same as strRight(-n))
,
extracts all but first n
characters ands[n:m]
is the same as
strRight(strLeft(s,m),-n)
, or same as
subStr(s,n,m-n)
.If index n
is negative or greater than
strLen(s)
, then n %% strLen(s)
is used. The
same holds for m
.
{# [:6],
s[:-2],
s[2:],
s[-6:],
s[2:6],
s[2:-2],
s[-6:6],
s[-6:-2]
s
#}where {
s = "abcdefgh";
}
{# 'abcdef', 'abcdef', 'cdefgh', 'cdefgh', 'cdef', 'cdef', 'cdef', 'cdef' #}
It is often easier to use slice operator than extraction functions, but they essentially do the same thing.
In case of UTF-8 strings, slice operators may return invalid strings.
These operators treat strings as having single byte characters only. If
a slice begins or ends in the middle of a multi-byte UTF-8 code-point,
the result will not be a valid UTF-8 string. To get a valid UTF-8 slice,
with positions denoted in UTF-8 code-points, please use
utfSlice
.
Wafl core library includes the following string searching functions:
Function / Type and Description
strPos
(String * String -> Int)
Finds first position of
a substring in the string, or -1 if not found.
strPosI
(String * String -> Int)
Same as strPos, but
ignores letter case.
strNextPos
(String * String * Int -> Int)
Finds next
position of a substring in the string, after given pos.
strNextPosI
(String * String * Int -> Int)
Same as
strNextPos, but ignores letter case.
strLastPos
(String * String -> Int)
Finds last position of
a substring in the string, or -1 if not found.
strLastPosI
(String * String -> Int)
Same as strLastPos, but
ignores letter case.
strNextLastPos
(String * String * Int -> Int)
Finds next last
position of a substring in the string, before given pos.
strNextLastPosI
(String * String * Int -> Int)
Same as
strNextLastPosI, but ignores letter case.
strBeg
(String * String -> Bool)
Check if the 2nd
string is at the beginning of the 1st.
strEnd
(String * String -> Bool)
Check if the 2nd
string is at the end of the 1st.
All search functions return the beginning position of the second
specified string in the first specified string, if it is found, and
-1
if it is not.
Functions whose names end with ‘I’ ignore case: strPosI
,
strNextPosI
, strLastPosI
, and
strNextLastPosI
.
Case-insensitive search works by first converting both strings to uppercase. This can be inefficient for larger strings.
strPos
,
strPosI
, strNextPos
and
strNextPosI
Function strPos(s,p)
returns the position of the first
occurrence of the string p
in string s
.
Function strPosI(s,p)
returns the position of the first
occurrence of the string p
in string s
,
ignoring the letter case.
Function strNextPos(s,p,i)
returns the position of the
first occurrence of the string p
in string s
after the position i
.
Function strNextPosI(s,p,i)
returns the position of the
first occurrence of the string p
in string s
after the position i
, ignoring the letter case.
{# strPos( s, n ), // not existing
strPos( s, x ),
strNextPos( s, x, 0 ),
strNextPos( s, x, 3 ),
strNextPos( s, x, 6 ),
strNextPos( s, x, 9 )
#}where {
s = "abcabcabcab";
x = "ab";
n = "xxx";
};
{# -1, 0, 3, 6, 9, -1 #}
strLastPos
,
strLastPosI
, strNextLastPos
and
strNextLastPosI
Function strLastPos(s,p)
returns the position of the
last occurrence of the string p
in string
s
.
Function strPosI(s,p)
returns the position of the last
occurrence of the string p
in string s
,
ignoring the letter case.
Function strNextPos(s,p,i)
returns the position of the
last occurrence of the string p
in string s
before the position i
.
Function strNextPosI(s,p,i)
returns the position of the
last occurrence of the string p
in string s
before the position i
, ignoring the letter case.
Please note that case insensitive functions strLastPosI
and strNextLastPosI
do not work well for UTF-8 multi-byte
characters.
{# strLastPos( s, n ), // not existing
strLastPos( s, x ),
strNextLastPos( s, x, 9 ),
strNextLastPos( s, x, 6 ),
strNextLastPos( s, x, 3 ),
strNextLastPos( s, x, 0 )
#}where {
s = "abcabcabcab";
x = "ab";
n = "xxx";
};
{# -1, 9, 6, 3, 0, -1 #}
strBeg
and
strEnd
Functions strBeg
and strEnd
check if the
first string has the second string at the beginning or at the end:
{#strBeg( "abcdef", "ab" ),
strBeg( "abcdef", "cd" ),
strBeg( "abcdef", "ef" ),
strEnd( "abcdef", "ab" ),
strEnd( "abcdef", "cd" ),
strEnd( "abcdef", "ef" )
#}
{# true, false, false, false, false, true #}
Wafl core library includes the following substring counting functions:
Function / Type and Description
strCountSub
(String * String -> Int)
Count occurences of
substring in the given string:
strCountSub('aaaaaA','aa') == 4
strCountSubI
(String * String -> Int)
Same as strCountSub,
but ignores letter case:
strCountSub('aaaaaA','aa') == 5
strCountSubDis
(String * String -> Int)
Count disjunct
occurences of substring in the given string:
strCountSub('aaaaaA','aa') == 2
strCountSubDisI
(String * String -> Int)
Same as strCountSubDis,
but ignores letter case:
strCountSub('aaaaaA','aa') == 3
All counting functions return the count of appearances of the given substring in the given string. The functions which names end with ‘I’, ignore the letter case. The functions with ‘Dis’ in the names count only the disjunct substrings.
{# strCountSub('aaaaaA','aa'),
strCountSubI('aaaaaA','aa'),
strCountSubDis('aaaaaA','aa'),
strCountSubDisI('aaaaaA','aa')
#}
{# 4, 5, 2, 3 #}
Functions strCountSubI
and strCountSubDisI
work by converting both strings to uppercase first. They may be
inefficient for larger strings.
Wafl core library includes the following string searching functions:
Function / Type and Description
strReplace
(String * String * String * Int -> String)
Replaces Nth occurrence of substring with given string:
strReplace('ababa','b','c',2) == 'abaca'
strReplaceI
(String * String * String * Int -> String)
Same
as strReplace, but ignores letter case.
strReplaceAll
(String * String * String -> String)
Replaces
all occurrences of substring with given string.
strReplaceAllI
(String * String * String -> String)
Same as
strReplaceAll, but ignores letter case.
Each of strReplace*
functions evaluates a new string and
does not modify any of the given strings.
Function strReplace(s,p,x,i)
returns a copy of string
s
where i
.th occurrence of substring
p
is replaced with x
. String s
remains unmodified.
Function strReplaceI(s,p,x,i)
evaluates the same as
strReplace
, but ignores letter case while searching for
p
.
Function strReplaceAll(s,p,x)
returns a copy of string
s
where all occurrences of substring p
are
replaced with x
. String s
remains
unmodified.
Function strReplaceAllI
evaluates the same a
strReplaceAll
, but ignores letter case while searching for
p
.
{#strReplace( s, 'a', '@', 2 ),
strReplaceI( s, 'a', '@', 2 ),
strReplaceAll( s, 'a', '@' ),
strReplaceAllI( s, 'a', '@' )
#}where {
s = "abABabAB";
}
{# 'abAB@bAB', 'ab@BabAB', '@bAB@bAB', '@b@B@b@B' #}
Please note that case insensitive functions strReplaceI
and strReplaceAllI
may be inefficient for larger
strings.
strSplit...
and strJoin
Here we meet a list of strings. The list is one of the most important concepts of functional programming languages, including Wafl. We will discuss lists in details in the following chapter.
Function strSplit(s,p)
returns a list of all substrings
of the string s
which are separated from each other by
substring p
.
Function strJoin(lst,p)
concatenates all elements of the
list lst
, inserting the delimiter p
between
them.
Function / Type and Description
strSplit
(String * String -> List[String])
Splits a
string to a list of string, by extracting the given separator.
strSplitTrim
(String * String -> List[String])
Splits a
string to a list of string, by extracting the given separator. All
spaces are trimmed from each segment from left and right side.
strSplitLines
(String -> List[String])
Splits a string to a
list of string, by extracting new-line separator.
strSplitLinesTrim
(String -> List[String])
Splits a string to a
list of string, by extracting new-line separator. All spaces are trimmed
from each segment from left and right side.
strJoin
(Sequence['1][String] * String -> String)
Joins
(concatenates) a sequence of strings, adding the given separator.
strSplit( 'a,bb,c,dd,e', ',' )
['a', 'bb', 'c', 'dd', 'e']
strJoin( ['a','b','c','d','e','f','g','h'], ';' )
a;b;c;d;e;f;g;h
strJoin( strSplit( 'a,bb,c,dd,e', ','), ';' )
a;bb;c;dd;e
Function strSplitTrim
is similar to
strSplit
, but it detects and removes all empty spaces
detected before and after the separators. It is functionally equivalent,
but more efficient than mapping strTrim
after the
split:
strSplit(p).map(strTrim) == s.strSplitTrim(p) s.
Function strSplitLines(s)
is similar to
strSplit(s,'\n')
, but it detects and removes both LF
(Linux, ‘\n
’) and CRLF (Windows, ‘\r\n
’) new
line sequences:
{#strSplit( '\nabc\ndef\nghi\n', '\n' ),
strSplit( '\r\nabc\r\ndef\r\nghi\r\n', '\n' ),
strSplitLines( '\nabc\ndef\nghi\n' ),
strSplitLines( '\r\nabc\r\ndef\r\nghi\r\n' )
#}
{# ['', 'abc', 'def', 'ghi', ''], ['\015', 'abc\015', 'def\015', 'ghi\015', ''], ['', 'abc', 'def', 'ghi', ''], ['', 'abc', 'def', 'ghi', ''] #}
Function strSplitLinesTrim
is similar to
strSplitTrim
and strSplitLines
. It is
functionally equivalent, but more efficient than mapping
strTrim
after using strSplitLines
:
strSplitLines().map(strTrim) == s.strSplitLinesTrim() s.
Function strChars
converts a string to a list of
characters.
{#strChars( 'abc\ndef' )
#}
{# ['a', 'b', 'c', '\012', 'd', 'e', 'f'] #}
In case of UTF-8 strings, strChars
will cut a string in
bytes. To get a valid UTF-8 code-points list, please use
utfChars
instead.
Sometimes we need to encode a string in a format that follows the
specific syntax. Wafl contains four string encoding functions. All of
these functions have the common type:
(String -> String)
.
Function strEncodeHtml(s)
returns encoded string ready
to include in HTML. All special characters are replaced with appropriate
HTML character sequences:
strEncodeHtml( "abc&<>def" )
abc&<>def
Function strEncodeSql(s)
returns encoded string ready
for use in SQL string literals. All special characters are replaced with
SQL escape sequences:
strEncodeSql( "abc'quotes'abc" )
abc''quotes''abc
Function strEncodeUri(s)
returns encoded string
according the rules of URI syntax:
strEncodeUri( "a + b = c" )
a%20%2B%20b%20%3D%20c
Function strEncodeWafl(s)
returns encoded string
according the Wafl syntax:
strEncodeWafl( "a\n \0 \'\"..." )
a\012 \000 \'\"...
To handle UTF-8 strings in a proper way, please use only the functions treating strings as sequences of UTF-8 multi-byte code-points.
The most of the specific UTF-8 behavior is covered by the following functions. However, please note that two important functionalities are not supported by Wafl library, yet:
Some applications require files with UTF-8 content to have a UTF-8 BOM (byte order mark) at the beginning of the file. The following functions provide UTF-8 BOM handling.
Please note that UTF-8 BOM usage is not recommended, because byte-order is irrelevant in case of UTF-8 format. It is based on individual bytes, not words.
Function / Type and Description
utfBom
( -> String)
Get UTF-8 BOM sequence.
utfIsBom
(String -> Bool)
Check if a string content is
UTF-8 BOM.
utfHasBom
(String -> Bool)
Check if a string begins with
UTF-8 BOM.
utfAddBom
(String -> String)
Adds a UTF-8 BOM, if not
already present.
utfTrimBom
(String -> String)
Trims leading BOM, if
present.
Function utfBom()
returns the UTF-8 BOM.
Function utfIsBom(s)
checks if the string content is
exactly the UTF-8 BOM.
Function utfHasBom(s)
checks if the string begins with a
UTF-8 BOM.
{#utfBom(),
utfIsBom( utfBom() ),
utfIsBom( 'abc' ),
utfHasBom( utfBom() + 'abc' ),
utfHasBom( 'abc' )
#}
{# '\357\273\277', true, false, true, false #}
Function utfAddBom(s)
returns a string with a UTF-8 BOM
added to the beginning, if not already present.
Function utfTrimBom(s)
returns a string without a
leading UTF-8 BOM.
{#utfAddBom('abc'),
utfHasBom( utfAddBom('abc') ),
utfTrimBom( utfAddBom('abc') ),
utfHasBom( utfTrimBom( utfAddBom('abc') ) )
#}
{# '\357\273\277abc', true, 'abc', false #}
Function / Type and Description
utfIsValid
(String -> Bool)
Check if a string is a valid
UTF-8 encoded string.
utfRepInvalid
(String * String -> String)
Replace invalide
code points with the given character.
Function utfIsValid(s)
checks if a string is valid UTF-8
string. Please note, each single-byte characters string is a valid UTF-8
string.
Function utfRepInvalid(s,c)
computes a strings where all
invalid UTF-8 code-points in s
are replaced with character
c
.
{#utfIsValid( sub('abc€def',4,4) ),
utfRepInvalid( sub('abc€def',0,4), '@' ), // only the first byte of MB
utfRepInvalid( sub('abc€def',4,4), '@' ) // only the second byte of MB
#}
{# false, 'abc@', '@@de' #}
utfLen
Function / Type and Description
utfLen
(String -> Int)
Get UTF-8 length, as a number of
complete code points.
Function utfLen(s)
computes the string length by
counting the complete UTF-8 code-points. The string length in
code-points is always less or equal to the length in bytes
(strLen
, length
or size
).
{#strLen( 'abc€def' ),
utfLen( 'abc€def' )
#}
{# 9, 7 #}
utfAt
Function / Type and Description
utfAt
(String * Int -> String)
Returns a code point at
given position, indexed by codepoints.
Function utfAt(s,i)
is similar to indexing operator
s[i]
, but it uses code-point based indexes. It returns
i
-th UTF-8 code-point of string s
.
For more details please study indexing operator.
{#[0], s[1], s[2], s[3],
sutfAt(0), s.utfAt(1), s.utfAt(2), s.utfAt(3)
s.
#}where {
s = "abАБабAB";
}
{# 'a', 'b', '\320', '\220', 'a', 'b', '\320\220', '\320\221' #}
{#[-1], s[-2], s[-3], s[-4],
sutfAt(-1), s.utfAt(-2), s.utfAt(-3), s.utfAt(-4)
s.
#}where {
s = "abАБабAB";
}
{# 'B', 'A', '\261', '\320', 'B', 'A', '\320\261', '\320\260' #}
utfSub
,
utfSlice
, utfLeft
, utfRight
Function / Type and Description
utfSub
(String * Int * Int -> String)
Returns a
substring from given position (from 0) and with given length, indexing
complete UTF-8 code points instead of characters.
utfSlice
(String * Int * Int -> String)
Returns a
substring between two given positions, indexing complete UTF-8 code
points instead of characters.
utfLeft
(String * Int -> String)
Returns first N UTF-8
code points of the string.
utfRight
(String * Int -> String)
Returns last N UTF-8
code points of the string.
Function utfSub(s,p,n)
is similar to
subStr(s,p,n)
, but uses code-point based indexes. It
returns a substring of string s
, beginning at (zero based)
position p
with length n
, where position and
length are counted based on UTF-8 code-points instead of characters.
For more details please study subStr
.
{#subStr( "abcdАБВГабвгABCD", 0, 8 ),
utfSub( "abcdАБВГабвгABCD", 0, 8 )
#}
{# 'abcd\320\220\320\221', 'abcd\320\220\320\221\320\222\320\223' #}
Function utfSlice(s,n,m)
is similar to string slice
operator s[n:m]
, but uses code-point based indexes. It
returns a substring of string s
, beginning at (zero based)
position n
and ending before the position m
,
where positions n
and m
are counted based on
UTF-8 code-points instead of characters.
For more details please study string slice operator.
{#"abcdАБВГабвгABCD" [2:6],
utfSlice( "abcdАБВГабвгABCD", 2, 6 )
#}
{# 'cd\320\220', 'cd\320\220\320\221' #}
Function utfLeft(s,n)
is similar to string function
strLeft
, but uses code-point based indexes. It returns a
substring containing the first n
UTF-8 code-points of
string s
.
For more details please study strLeft
.
{#strLeft( "abcdАБВГабвгABCD", 6 ),
utfLeft( "abcdАБВГабвгABCD", 6 )
#}
{# 'abcd\320\220', 'abcd\320\220\320\221' #}
Function utfRight(s,n)
is similar to string function
strRight
, but uses code-point based indexes. It returns a
substring containing the last n
UTF-8 code-points of string
s
.
For more details please study strRight
.
{#strRight( "abcdАБВГабвгABCD", 6 ),
strRight( "abcdАБВГабвгABCD", 6 )
#}
{# '\320\263ABCD', '\320\263ABCD' #}
Function / Type and Description
utfChars
(String -> List[String])
Splits a string to a
list of UTF-8 code points.
utfReverse
(String -> String)
Reverses UTF-8 string.
Function utfChars(s)
is similar to string function
strChars
, but uses code-point instead of characters. It
returns a list of all UTF-8 code-points of string s
.
For more details please study strChars
.
{#strChars( "abАБабAB" ),
utfChars( "abАБабAB" )
#}
{# ['a', 'b', '\320', '\220', '\320', '\221', '\320', '\260', '\320', '\261', 'A', 'B'], ['a', 'b', '\320\220', '\320\221', '\320\260', '\320\261', 'A', 'B'] #}
Function utfReverse(s)
is similar to string function
strReverse
, but uses code-point instead of characters. It
returns a reversed string, taking care to preserve the UTF-8
code-points.
For more details please study strReverse
.
{#strReverse( "abАБабAB" ),
utfReverse( "abАБабAB" )
#}
{# 'BA\261\320\260\320\221\320\220\320ba', 'BA\320\261\320\260\320\221\320\220ba' #}
Here we discuss three more string functions:
Function / Type and Description
strLowerCase
(String -> String)
Converts all letters to lower
case.
strUpperCase
(String -> String)
Converts all letters to upper
case.
strReverse
(String -> String)
Reverses the string.
Function strLowerCase
converts a string to a
lowercase.
Function strUpperCase
converts a string to a
lowercase.
{#strLowerCase( 'aAbBcC' ),
strUpperCase( 'aAbBcC' )
#}
{# 'aabbcc', 'AABBCC' #}
Function strReverse
computes the reversed string.
{#strReverse( 'aAbBcC' )
#}
{# 'CcBbAa' #}
In case of UTF-8 strings, strReverse
may return invalid
strings. This function treats strings as having single byte characters
only. To get a valid UTF-8 reversed string, please use
utfReverse
.