As of SWI-Prolog version 7, text enclosed in double quotes 
(e.g.,
"Hello world") is read as objects of the type string. 
Strings are distinct from lists, which makes it possible to recognize 
them at runtime and print them using the string syntax:
?- write("Hello world!").
Hello world!
?- writeq("Hello world!").
"Hello world!"
A string is a compact representation of a character sequence that 
lives on the global (term) stack. Strings are represented by sequences 
of Unicode character codes including the character code 0 (zero). The 
length of strings is limited by the available space on the global (term) 
stack (see
set_prolog_stack/2). Section 
5.2.3 motivates the introduction of strings and mapping double 
quoted text to this type.
Whereas in version 7, double-quoted text is mapped to strings,
back-quoted text (as in `text`) is mapped to a 
list of
character codes, i.e. integers that are Unicode code points. In 
a traditional setting, back-quoted would be mapped to a list of
characters (also known as chars), which are atoms of 
length 1.
The settings for the flags that control how double- and back-quoted 
text is read is summarised in table 
8. Programs that aim for compatibility should realise that the ISO 
standard defines back-quoted text, but does not define the back_quotes 
Prolog flag and does not define the term that is produced by back-quoted 
text.
Table 8 : Mapping of double and back quoted 
text in the two modes.
With the introduction of strings as a Prolog data type, there are 
three main ways to represent text: using strings, using atoms and using 
lists of character codes. As a fourth way, one may also use lists of 
chars. This section explains what to choose for what purpose. Both 
strings and atoms are atomic objects: you can only look inside 
them using dedicated predicates, while lists of character codes or chars 
are compound data structures forming an extended structure that must 
follow a convention.
- Lists of character codes
- 
is what you need if you want to parse text using Prolog grammar 
rules (DCGs, see phrase/3). 
Most of the text reading predicates (e.g.,
read_line_to_codes/2) 
return a list of character codes because most applications need to parse 
these lines before the data can be processed. As said above, the back-quoted 
text notation (`hello`) can be used to easily specify 
a list of character codes. The0'cnotation can be used to 
specify a single character code.
- Atoms
- 
are identifiers. They are typically used in cases where 
identity comparison is the main operation and that are typically not 
composed nor taken apart. Examples are RDF resources (URIs that identify 
something), system identifiers (e.g., 'Boeing 747'), but 
also individual words in a natural language processing system. They are 
also used where other languages would use enumerated types, 
such as the names of days in the week. Unlike enumerated types, Prolog 
atoms do not form a fixed set and the same atom can represent different 
things in different contexts.
- Strings
- 
typically represents text that is processed as a unit most of the time, 
but which is not an identifier for something. Format specifications for
format/3 
is a good example. Another example is a descriptive text provided in an 
application. Strings may be composed and decomposed using e.g., string_concat/3 
and sub_string/5 
or converted for parsing using string_codes/2 
or created from codes generated by a generative grammar rule, also using string_codes/2.
Strings are manipulated using a set of predicates that mirrors the 
set of predicates used for manipulating atoms. In addition to the list 
below, string/1 
performs the type check for this type and is described in section 
4.5.
SWI-Prolog's string primitives are being synchronized with
ECLiPSe. 
We expect the set of predicates documented in this section to be stable, 
although it might be expanded. In general, SWI-Prolog's text 
manipulation predicates accept any form of text as input argument - they 
accept anytext input. anytext comprises:
- atoms
- strings
- lists of character codes
- list of characters
- number types: integers, floating point numbers and non-integer 
rationals. Under the hood, these must first be formatted into a text 
representation according to some inner convention before they can be 
used.
The predicates produce the type indicated by the predicate name as 
output. This policy simplifies migration and writing programs that can 
run unmodified or with minor modifications on systems that do not 
support strings. Code should avoid relying on this feature as much as 
possible for clarity as well as to facilitate a more strict mode and/or 
type checking in future releases.
- atom_string(?Atom, 
?String)
- 
Bi-directional conversion between an atom and a string. At least one of 
the two arguments must be instantiated. An initially uninstantiated 
variable on the “string side” is always instantiated to a 
string. An initially uninstantiated variable on the “atom side” is 
always instantiated to an atom. If both arguments are instantiated, 
their list-of-character representations must match, but the types are 
not enforced. The following all succeed:
atom_string("x",'x').
atom_string('x',"x").
atom_string(3.1415,3.1415).
atom_string('3r2',3r2).
atom_string(3r2,'3r2').
atom_string(6r4,3r2).
- number_string(?Number, 
?String)
- 
Bi-directional conversion between a number and a string. At least one of 
the two arguments must be instantiated. Besides the type used to 
represent the text, this predicate differs in several ways from its ISO 
cousin:170Note that SWI-Prolog's 
syntax for numbers is not ISO compatible either.
 
- If String does not represent a number, the predicate fails 
rather than throwing a syntax error exception.
- Leading white space and Prolog comments are not allowed.
- Numbers may start with +
-
- It is not allowed to have white space between a leading +
-
- Floating point numbers in exponential notation do not require a dot 
before exponent, i.e., "1e10"is a valid number.
 Unlike other predicates of this family, if instantiated, String 
cannot be an atom.
 The corresponding‘atom-handling’predicate is atom_number/2, 
with reversed argument order. 
- term_string(?Term, 
?String)
- 
Bi-directional conversion between a term and a string. If String 
is instantiated, it is parsed and the result is unified with Term. 
Otherwise Term is‘written’using the option quoted(true)and the result is converted to String.
- term_string(?Term, 
?String, +Options)
- 
As term_string/2, 
passing Options to either read_term/2 
or write_term/2. 
For example:
?- term_string(Term, 'a(A)', [variable_names(VNames)]).
Term = a(_9674),
VNames = ['A'=_9674].
 
- string_chars(?String, 
?Chars)
- 
Bi-directional conversion between a string and a list of characters. At 
least one of the two arguments must be instantiated.
See also: atom_chars/2. 
- string_codes(?String, 
?Codes)
- 
Bi-directional conversion between a string and a list of character 
codes. At least one of the two arguments must be instantiated.
- string_bytes(?String, 
?Bytes, +Encoding)
- 
True when the (Unicode) String is represented by Bytes 
in
Encoding. If String is instantiated it may 
represent text as an atom, string, list of character codes or list or 
characters.
Bytes is always a list of integers in the range 0 ... 
255. At least one of String or Bytes must be 
instantiated. This predicate is notably intended as an intermediate step 
to perform byte oriented operations on text. Examples are (base64) 
encoding, encryption, computing a (secure) hash, etc. Encoding 
is typically
utf8. All valid stream encodings except forwchar_tare supported. See section 
2.18.1. Note that this translation is only provided for strings. 
Creating an atom from bytes requires
atom_string/2.171Strings 
are an efficient intermediate and this conversion is needed only in some 
uncommon scenarios.
- [det]text_to_string(+Text, 
-String)
- 
Converts Text to a string. Text is anytext 
excluding the number types. When running in
--traditional mode, '[]'is ambiguous and 
interpreted as an empty string.
- string_length(+String, 
-Length)
- 
Unify Length with the number of characters in String. 
This predicate is functionally equivalent to atom_length/2 
and also accepts
anytext as its first argument. Numeric types are formatted into 
strings before the length of their string representation is determined.172This 
behavior should be considered deprecated See also write_length/3.
- string_code(?Index, 
+String, ?Code)
- 
True when Code represents the character at the 1-based Index 
position in String. If Index is unbound the string 
is scanned from index 1. Raises a domain error if Index is 
negative. Fails silently if Index is zero or greater than the 
length of
String. The mode string_code(-,+,+)is 
deterministic if the searched-for Code appears only once in String. 
See also
sub_string/5.
- get_string_code(+Index, 
+String, -Code)
- 
Semi-deterministic version of string_code/3. 
In addition, this version provides strict range checking, throwing a 
domain error if Index is less than 1 or greater than the 
length of String. ECLiPSe provides this to support String[Index]notation.
- string_concat(?String1, 
?String2, ?String3)
- 
Similar to atom_concat/3, 
but the unbound argument will be unified with a string object rather 
than an atom. Also, if both String1 and
String2 are unbound and String3 is bound to text, 
it breaks
String3, unifying the start with String1 and the 
end with
String2 as append does with lists. Note that this is not 
particularly fast on long strings, as for each redo the system has to 
create two entirely new strings, while the list equivalent only creates 
a single new list-cell and moves some pointers around.
- [det]split_string(+String, 
+SepChars, +PadChars, -SubStrings)
- 
Break String into SubStrings. The SepChars 
argument provides the characters that act as separators and thus the 
length of
SubStrings is one more than the number of separators found if
SepChars and PadChars do not have common 
characters. If
SepChars and PadChars are equal, sequences of 
adjacent separators act as a single separator. Leading and trailing 
characters for each substring that appear in PadChars are 
removed from the substring. The input arguments can be either atoms, 
strings or char/code lists. Compatible with ECLiPSe. Below are some 
examples:
A simple split wherever there is a‘.’:
 
?- split_string("a.b.c.d", ".", "", L).
L = ["a", "b", "c", "d"].
Consider sequences of separators as a single one:
 
?- split_string("/home//jan///nice/path", "/", "/", L).
L = ["home", "jan", "nice", "path"].
Split and remove white space:
 
?- split_string("SWI-Prolog, 7.0", ",", " ", L).
L = ["SWI-Prolog", "7.0"].
Only remove leading and trailing white space (trim the 
string):
 
?- split_string("  SWI-Prolog  ", "", "\s\t\n", L).
L = ["SWI-Prolog"].
In the typical use cases, SepChars either does not overlap
PadChars or is equivalent to handle multiple adjacent 
separators as a single (often white space). The behaviour with partially 
overlapping sets of padding and separators should be considered 
undefined. See also read_string/5. 
- sub_string(+String, 
?Before, ?Length, ?After, ?SubString)
- 
This predicate is functionally equivalent to sub_atom/5, 
but operates on strings. Note that this implies the string input 
arguments can be either strings or atoms. If SubString is 
unbound (output) it is unified with a string. The following example 
splits a string of the form
<name>=<value> into the name part (an 
atom) and the value (a string).
name_value(String, Name, Value) :-
    sub_string(String, Before, _, After, "="),
    !,
    sub_atom(String, 0, Before, _, Name),
    sub_string(String, _, After, 0, Value).
The next example defines a predicate that inserts a value at a 
position. See sub_atom/5 
for more examples.
 
string_insert(Str, Val, At, NewStr) :-
    sub_string(Str, 0, At, A1, S1),
    sub_string(Str, At, A1, _, S2),
    atomics_to_string([S1,Val,S2], NewStr).
- atomics_to_string(+List, 
-String)
- 
List is a list of strings, atoms, or number types. Succeeds 
if String can be unified with the concatenated elements of List. 
Equivalent to atomics_to_string(List,’’, String).
- atomics_to_string(+List, 
+Separator, -String)
- 
Creates a string just like atomics_to_string/2, 
but inserts
Separator between each pair of inputs. For example:
?- atomics_to_string([gnu, "gnat", 1], ', ', A).
A = "gnu, gnat, 1"
 
- string_upper(+String, 
-UpperCase)
- 
Convert String to upper case and unify the result with
UpperCase.
- string_lower(+String, 
LowerCase)
- 
Convert String to lower case and unify the result with
LowerCase.
- read_string(+Stream, 
?Length, -String)
- 
Read at most Length characters from Stream and 
return them in the string String. If Length is 
unbound, Stream is read to the end and Length is 
unified with the number of characters read. The number of bytes 
read depends on the encoding of Stream (see section 
2.18.1). This predicate may be used to read a sequence of bytes when 
the stream is in octetencoding. See open/4 
and set_stream/2 
for controlling the encoding.
- read_string(+Stream, 
+SepChars, +PadChars, -Sep, -String)
- 
Read a string from Stream, providing functionality similar to
split_string/4. 
The predicate performs the following steps:
 
- Skip all characters that match PadChars
- Read up to a character that matches SepChars or end of 
file
- Discard trailing characters that match PadChars from the 
collected input
- Unify String with a string created from the input and
Sep with the code of the separator character read. If input 
was terminated by the end of the input, Sep is unified with 
-1.
 The predicate read_string/5 
called repeatedly on an input until
Sep is -1 (end of file) is equivalent to reading the entire 
file into a string and calling split_string/4, 
provided that SepChars and PadChars are not partially 
overlapping.173Behaviour that 
is fully compatible would require unlimited look-ahead. 
Below are some examples:
 Read a line:
 
read_string(Input, "\n", "\r", Sep, String)
 Read a line, stripping leading and trailing white space:
 
read_string(Input, "\n", "\r\t ", Sep, String)
 Read up to‘,’or‘)’, 
unifying Sep with0',i.e. Unicode 44, or0'), 
i.e. Unicode 41:
 
read_string(Input, ",)", "\t ", Sep, String)
 
- open_string(+String, 
-Stream)
- 
True when Stream is an input stream that accesses the content 
of
String. String can be any text representation, 
i.e., string, atom, list of codes or list of characters. The created Stream 
has the repositionproperty (see stream_property/2). 
Note that the internal encoding of the data is either ISO Latin 1 or 
UTF-8.
Prolog defines two forms of quoted text. Traditionally, single quoted 
text is mapped to atoms while double quoted text is mapped to a list of
character codes (integers) or characters (atoms of length 1). 
Representing text using atoms is often considered inadequate for several 
reasons:
- It hides the conceptual difference between text and program symbols. 
Where content of text often matters because it is used in I/O, program 
symbols are merely identifiers that match with the same symbol 
elsewhere. Program symbols can often be consistently replaced, for 
example to obfuscate or compact a program.
 
- Atoms are globally unique identifiers. They are stored in a shared 
table. Volatile strings represented as atoms come at a significant price 
due to the required cooperation between threads for creating atoms. 
Reclaiming temporary atoms using Atom garbage collection is a 
costly process that requires significant synchronisation.
 
- Many Prolog systems (not SWI-Prolog) put severe restrictions on the 
length of atoms or the maximum number of atoms.
Representing text as lists, be it of character codes or characters, 
also comes at a price:
- It is not possible to distinguish (at runtime) a list of integers or 
atoms from a string. Sometimes this information can be derived from 
(implicit) typing. In other cases the list must be embedded in a 
compound term to distinguish the two types. For example, s("hello world")could be used to indicate that we are dealing with a string.Lacking runtime information, debuggers and the toplevel can only use 
heuristics to decide whether to print a list of integers as such or as a 
string (see portray_text/1).
 While experienced Prolog programmers have learned to cope with this, 
we still consider this an unfortunate situation.
 
 
- Lists are expensive structures, taking 2 cells per character (3 for 
SWI-Prolog in its current form). This stresses memory consumption on the 
stacks while pushing them on the stack and dealing with them during 
garbage collection is unnecessarily expensive.
We observe that in many programs, most strings are only handled as a 
single unit during their lifetime. Examining real code tells us that 
double quoted strings typically appear in one of the following roles:
-  A DCG literal
- 
Although represented as a list of codes is the correct representation 
for handling in DCGs, the DCG translator can recognise the literal and 
convert it to the proper representation. Such code need not be modified.
-  A format string
- 
This is a typical example of text that is conceptually not a program 
identifier. Format is designed to deal with alternative representations 
of the format string. Such code need not be modified.
-  Getting a character code
- 
The construct [X] = "a"is a commonly used template for 
getting the character code of the letter’a’. ISO Prolog 
defines the syntax0'afor this purpose. Code using this 
must be modified. The modified code will run on any ISO compliant Prolog 
Processor.
-  As argument to list predicates to operate on strings
- 
Here, we might see code similar to append("name:", Rest, Codes). 
Such code needs to be modified. In this particular example, the 
following is a good portable alternative:phrase("name:", Codes, Rest)
-  Checks for a character to be in a set
- 
Such tests are often performed with code such as this:
memberchk(C, "~!@#$"). This is a rather inefficient check 
in a traditional Prolog system because it pushes a list of character 
codes cell-by-cell onto the Prolog stack and then traverses this list 
cell-by-cell to see whether one of the cells unifies with C. 
If the test is successful, the string will eventually be subject to 
garbage collection. The best code for this is to write a predicate as 
below, which pushes nothing on the stack and performs an indexed lookup 
to see whether the character code is in‘my_class’.
my_class(0'~).
my_class(0'!).
...
 An alternative to reach the same effect is to use term expansion to 
create the clauses:
 
term_expansion(my_class(_), Clauses) :-
        findall(my_class(C),
                string_code(_, "~!@#$", C),
                Clauses).
my_class(_).
Finally, the predicate string_code/3 
can be exploited directly as a replacement for the memberchk/2 
on a list of codes. Although the string is still pushed onto the stack, 
it is more compact and only a single entity.
 
The predicates in this section can help adapting your program to the 
new convention for handling double quoted strings. We have adapted a 
huge code base with which we were not familiar in about half a day.
- list_strings
- 
This predicate may be used to assess compatibility issues due to the 
representation of double quoted text as string objects. See
section 5.2 and section 
5.2.3. To use it, load your program into Prolog and run list_strings/0. 
The predicate lists source locations of string objects encountered in 
the program that are not considered safe. Such string need to be 
examined manually, after which one of the actions below may be 
appropriate:
 
- check:string_predicate(:PredicateIndicator)
- 
Declare that PredicateIndicator has clauses that contain 
strings, but that this is safe. For example, if there is a predicate 
help_info/2 , where the second argument contains a double quoted string 
that is handled properly by the predicates of the applications’help 
system, add the following declaration to stop
list_strings/0 
from complaining:
:- multifile check:string_predicate/1.
check:string_predicate(user:help_info/2).
 
- check:valid_string_goal(:Goal)
- 
Declare that calls to Goal are safe. The module qualification 
is the actual module in which Goal is defined. For example, a 
call to format/3 
is resolved by the predicate system:format/3. 
and the code below specifies that the second argument may be a string 
(system predicates that accept strings are defined in the library).
:- multifile check:valid_string_goal/1.
check:valid_string_goal(system:format(_,S,_)) :- string(S).