Toma Language Description

Author: Rani Pinchuk
URI: http://www.topiwriter.com/toma/Toma.html
Copyright: © 2004-2006 Space Applications Services NV
Last updated: 31-Oct-2006

Introduction

Toma is an "all-in-one" TM*L language: a Topic Map Query Language (TMQL), a Topic Map Manipulation Language (TMML) and a Topic Map Constraint Language (TMCL). Although its syntax is similar to that of SQL, it has a powerful path expression syntax which allows to access elements of the Topic Map. Toma offers the SELECT, INSERT, UPDATE and DELETE statements, used to query and manipulate the Topic Map. The MERGE statement is used to merge Topic Maps, and the EXPORT statement is used to export the Topic Map to XTM. Set of statements are provided for defining and managing constraints. Finally, Toma provides functions which allow to modify, convert and aggregate the data coming from the Topic Map.

An implementation of Toma exists as a Topic Map engine called TopiEngine. This implementation delivers the promised power of the language with attractive performance.

1. General Features

1.1 Semicolon

Each Toma statement has to end with a semicolon.

Example:
  select $topic where $topic='cpu';
  
  $topic
  -------
  cpu

1.2 Whitespaces

White spaces are used in order to identify tokens. Apart from that, white space characters and new-lines have no meaning, and they can be used to indent statements.

1.3 Case Sensitivity

Toma is case sensitive, although the reserved words of the language are case insensitive (like in SQL).

Example:
  select $topic where $topic='cpu';
  
  $topic
  -------
  cpu

  SELECT $topic WHERE $topic='cpu';
  
  $topic
  -------
  cpu
  
  select $topic where $topic='CPU';
  
  $topic
  -------
  
As you can see, the last one doesn't have any results, because there doesn't exist a topic with as basename the uppercase word 'CPU'.

1.4 Toma Strings

Toma strings are defined by single quotes (like in SQL). A backslash before a quote escapes that quote. Example:

  'this is a quoted string'
 
  'and that''s also a quoted string'

1.5 Comments

Comments are written by using the hash sign (#). Any text following the hash sign and till the end of the line, is ignored.

Example:
  select $topic where $topic='cpu'; #this is a comment

1.6 Topic Literals

A topic literal is an expression that is resolved to a topic. The following table details the available topic literals in Toma.

Expression Resolved to Example
id(<quoted-string>) a topic with that id id('host-location')
bn(<quoted-string>) topics with that basename bn('Processor')
var(<quoted-string>) topics with that variant var('CPU')
al(<quoted-string>) topics with that alias al('CPU')
si(<quoted-string>) a topic with that subjectIndicatorsi('http://www.topicmaps.org/subjectIndicator xtm/1.0/core.xtm#scope')

Hence, by default, Toma uses topic ids to refer to topics. However, other topic literals provide a way to refer to topics using other reference than the topic id (which is often automatically generated, and thereby unknown to the user).

1.7 Naked Identifier

A naked identifier is a string without quotes around it and can be used to write the id topic literal (see Topic Literals). It can be located within typing brackets, following the scope operator (the @ sign) or as a role (after or before the arrow in the association expression).

In the following example oc(location) is resolved equivalently to oc(id('location')) as the word location is here a naked identifier. Both expressions of the example are resolved to the occurrences whose type has the id "location".

  select oc(location);  #equivalent to oc(id('location'));
In the same way the following can be written:
  select $topic.bn@en;  #equivalent to $topic.bn@id('en')
Both expressions in this example are resolved to the basenames of the topics that have an english scope.

1.8 Topic Variables

Toma allows to define variables. A variable is written as a dollar sign followed by any letter (upper or lower case) or a underscore and optionally followed by any alphanumeric or underscore characters. For example:

  $<variable_name>
Each variable has a type. The types of the variables can be: Note that no special type is defined for scopes, types or member roles as the engine which implements Toma always reifies the scopes, types and the member roles - and therefore they are topics. The type of the variable is determined by the syntax or the semantics of the statement. The default type of a variable is topic. Often, a variable is used only once. In that case, we do not care about its value, nor about its name. The only thing we care about is that it is dirent from any other variable in the statement. In those cases we can use the anonymous variable $$. The anonymous variable is a variable like any other variable, but without a name. If there are several anonymous variables in a statement, they are all interpreted as dirent variables.

Examples:

  $a
  $topic
  $person
  $association

There also exists an anonymous variable: $$, that can be used if the value of the variable is not needed further in the query.

2. Toma path expressions

In order to address different elements in the Topic Map, Toma provides path expressions. The path expressions are chained using a dot. All the path expressions of Toma might have an input, an output and a result value.

Input:
The input is what comes to the left of the path expression.
For example,in the path expression $topic.bn.var, $topic is the input of .bn and $topic.bn is the input of .var.
Output:
The output is the result of the expression itself.
For example, in thepath expression $topic.bn.var, the output of $topic.bn is the baseName branch and therefore it is possible to ask about its variant.
Result:
The result value is the textual representation of the output.
For example, the result value of a baseName, is the baseNameString. The result value is taken into account only if this expression is the last (right-most) expression in the path. For example, the result value of the expression $topic.bn.var is the actual variant of the base name of the topic $topic.

Path expressions can work with sequences. For example, a topic might have more than one base name. In that case, the path expression $topic.bn is resolved to a sequence of base names.
Example:

  >>select $topic.bn where $topic='cpu';
  
   $topic
  ------------------------
  central processing unit
  central processor
  processor
  mainframe

2.1 Elementary Path Expressions

Elementary path expressions are accessors which refer to a specific element in the Topic Map. The following table gives an overview.

Expression Input Output Result
.id topic / baseName /
variant / occurrence /
association
id of the input The same as the output
.bn topic baseName of the input baseNameString as a string
.si topic subjectIdentity of the
input
topicRef, subjectIndicatorRef
or resourceRef of the
subjectIdentity as a string
.tr subjectIdentity topicRef of the input
as a string
The same as the output
.sir subjectIdentity subjectIndicatorRef
of the input
The same as the output
.var baseName variant of the input The resourceData or
resourceRef of the variant
as strings
.sc baseName / variant /
occurrence / association
A topic which is the
scope of the input
The topic id of the output
.rr subjectIdentity /
occurrence / variant
resourceRef of the
input
The same as the output
.oc topic occurrence of the
input
The resourceRef or resourceData
of the occurrence as a string
.rd occurrence / variant resourceData of the
input
The same as the output
.player association A topic which is a
player of the input
The topic id of the output
.role association A topic which is a
role of the input
The topic id of the output

2.1.1 Id (.id)

Input
topic / baseName / variant / occurrence / association
Output
id of the input
Result
The same as the output

Example:

  $topic.id
  $topic.bn.id
  $topic.bn.var.id
  $topic.oc.id
  $association.id

2.1.2 Base name (.bn)

Input
topic
Output
baseName of the input
Result
baseNameString as a string

Example:

  $topic.bn
Extending path expressions for .bn:
  
  $topic.bn.var
  $topic.bn.sc

2.1.3 Variant (.var)

Input
baseName
Output
variant of the input
Result
The resourceData or resourceRef of the variant as strings

Example:

  $topic.bn.var
Extending path expressions for .var:
  
  $topic.bn.var.rd
  $topic.bn.var.rr
  $topic.bn.var.sc

2.1.4 Subject Identity (.si)

Input
topic
Output
subjectIdentity of the input
Result
topicRef, resourceIndicatorRef or resourceRef of the subjectIdentity as a string

Example:

  $topic.si
Extending path expressions for .si:
  $topic.si.tr
  $topic.si.sir
  $topic.si.rr

2.1.5 Occurrence (.oc)

Input
topic
Output
occurrence of the input
Result
The resourceData or resourceRef of the occurrence as a string

Example:

   $topic.oc
Extending path expressions for .oc:
   $topic.oc.rd
   $topic.oc.rr
   $topic.oc.sc

2.1.6 TopicRef (.tr)

Input
subjectIdentity
Output
topicRef of the input as a string
Result
The same as the output

Example:

   $person.si.tr

2.1.7 SubjectIndicatorRef (.sir)

Input
subjectIdentity
Output
subjectIndicatorRef of the input as a string
Result
The same as the output

Example:

   $person.si.sir

2.1.8 ResourceRef (.rr)

Input
variant / subjectIdentity / occurrence
Output
resourceRef of the input
Result
The same as the output

Example:

   $person.bn.var.rr
   $person.si.rr
   $person.oc.rr

2.1.9 ResourceData (.rd)

Input
variant / occurrence
Output
resourceData of the input
Result
The same as the output

Example:

   $person.bn.var.rd
   $person.oc.rd

2.1.10 Scope (.sc)

Input
baseName / variant / occurrence / association
Output
A topic which is the scope of the input
Result
The topic id of the output

Example:

   $person.bn.sc
   $person.bn.var.sc
   $person.oc.sc
   $association.sc

2.1.11 Player (.player)

Input
association
Output
A topic which is a player of the input
Result
The topic id of the output

Example:

   $association.player

2.1.12 Role (.role)

Input
association
Output
A topic which is a role of the input
Result
The topic id of the output

Example:

   $association.role

2.2 Alias and reify

Unlike the elementary path expressions, these two path expression do not refer to any single Topic Map element.

2.2.1 Alias (.al)

An alias is both baseNames and variants. It is provided in order to be able to find a topic by its base name or its variant. It should be noted that alias is not an element type in the Topic Map, and therefore, .al returns a string (or a sequence of strings) which cannot be used as an input for another path expression.

Input
topic
Output
None
Result
baseNameString, or resourceRef or resourceData of a variant

Example:

  $topic.al

2.2.2 Reify (.reify)

Input
baseName / variant / occurrence / association
Output
The topic that reifies the input
Result
The topic id of the output

Example:

  $topic.bn.reify
  $topic.bn.var.reify
  $topic.oc.reify
  $association.reify

2.3 Instantiation

There are two path expressions that refer to the instantiation hierarchy defined in the Topic Map. The following table gives an overview.

Expression Input Output Result
.type topic The topic that is the
type of the input
The topic id of the output
.instance topic The topic that is the
instance of the input
The topic id of the output

Remark: you can use brackets after all of these expressions to indicate the LEVEL.

2.3.1 Type (.type)

Input
topic
Output
The topic that is the type of the input
Result
The topic id of the output

Example:

  $topic.type

Remark: you can use brackets after the .type expression to indicate the LEVEL of the type.

2.3.2 Instance (.instance)

Input
topic
Output
The topic that is the instance of the input
Result
The topic id of the output

Example:

  $topic.instance

Remark: you can use brackets after the .instance expression to indicate the LEVEL of the instance.

2.4 Inheritance

There are two path expressions that refer to the inheritance hierarchy defined in the Topic Map. The following table gives an overview.

Expression Input Output Result
.super topic The topic that is the
super-class of the input
The topic id of the output
.sub subjectIdentityThe topic that is the
sub-class of the input
The topic id of the output

Remark: you can use brackets after all of these expressions to indicate the LEVEL.

2.4.1 Superclass (.super)

Input
topic
Output
The topic that is the super-class of the input
Result
The topic id of the output

Example:

  $topic.super

Remark: you can use brackets after the .super expression to indicate the LEVEL of the super-class.

2.4.2 Subclass (.sub)

Input
topic
Output
The topic that is the sub-class of the input
Result
The topic id of the output

Example:

  $topic.sub

Remark: you can use brackets after the .sub expression to indicate the LEVEL of the sub-class.

2.5 Associations

2.5.1 The Association Expression

An association path expression is written as follows:

  association_id(association_type)->role
The whole path expression is resolved to a topic playing the given role in an association of the given type. The association id, the association type and the role can all be expressions that are resolved to topics. If the association id is not needed, it can be omitted. However, in case it is omitted, two different expressions can refer to two different associations. For example:
  (host-location)->host = $h
  and (host-location)->location = $l
In the two expressions above, $h and $l can be players of different associations of type host-location. If we want to refer to two players of the same association, we have to use the same association variable in both expressions:
  $a(host-location)->host = $h
  and $a(host-location)->location = $l
Note that a much better approach is to chain the two players as described later. Note also that in the following example, $a and $b can but do not have to refer to the same association. If you want them to be different, then you have to add a condition to the statement which states that $a should not be equal to $b ($a != $b).
  $a(host-location)->host = $h
  and $b(host-location)->location = $l

The main feature of the association path expression is that it has no input, thus it starts the path expression (much like a topic literal or a variable). Although this feature by itself is sometimes very useful, it can be a major disadvantage when trying to refer to a chain of associated topics as demonstrated in the following example. In this example, we refer to all the topics that are connected to a topic which is connected to the topic finger:

  $a(connect_to)->$r1 = id(’finger’)
  and $a(connect_to)->$r2 = $middle
  and $b(connect_to)->$r3 = $middle
  and $b(connect_to)->$r4 = $topic
  and $a != $b
Another disadvantage is the awkward way in which one must control the expressions using the association variables in such a chain (here we have to state that $a should not be equal to $b). Those disadvantages have been solved by introducing the left arrow as described in the next chapter.

Some example queries containing associations:

Example 1:

  select $association
    where $association.id = 'part-whole';
In this example the engine doesn't know that the variable $association is an association. It deals with it, as if it were an ordinary topic variable and then looks among all topics which one has the id 'part-whole'.

Example 2:

  select $association
    where $association(part-whole)->part = 'cpu'
    and $association.sc = 'functional';
In this example the engine knows that $association is an association variable, because the association path expression is used. This variable can then be used later in the query to narrow down the result. Here it will look among all associations with type 'part-whole' for an association that has the topic with id 'functional' as a scope and that has the topic with id 'cpu' as a member within the role 'part'.

Example 3:

  select $topic
    where $a(superclass-subclass)->superclass = $topic
    and $b(superclass-subclass)->subclass = $topic
    and $a != $b;
In this example a topic is searched for that is member of two associations: the topic has to have a role superclass in the first association $a of type 'superclass-subclass' and has to have a role subclass in the second association $b of type 'superclass-subclass'. Also the two associations have to be different.

2.5.2 Chaining Players

When you want to describe a long chain of associations, you will find that it is difficult to do this with the ordinary path expression as described in the previous chapter, as you can see in the following example:

  select $topic
   where $a(connect_to)->connected = 'little_finger'
   and $a(connect_to)->connected = $p1
   and $p1 not in ('little_finger')
   and $b(connect_to)->connected = $p2
   and $b(connect_to)->connected = $p1
   and $p2.bn != $p1.bn
   and $c(connect_to)->connected = $p
   and $c(connect_to)->connected = $p2
   and $p.bn != $p2.bn
   and $a != $b
   and $b != $c;
Here we want to find the topics that are connected to little_finger via two other topics. You can see that it takes a lot of lines to describe this.

Therefore the association path expression from the previous chapter is extended to include also the role of the input player in the same association in order to chain players:

  .role1<-association_id(association_type)->role2
This path expression is resolved to a player that plays the role2 in the association of type association type where the input player (the one coming to the left of the association path expression) plays the role role1 in the very same association. If more than one such association path expression is chained, the associations of two consecutive players are never the same.

Now you can rewrite the example above to be:

  select $topic
    where id(’little_finger’).$$<-(connect_to)->$$
                             .$$<-(connect_to)->$$
                             .$$<-(connect_to)->$$ = $topic;
Remark the anonymous variables for the roles, as we are not interested in those.

2.6 Square Brackets

The output of a path expression (its right side) can be either empty, one element or a sequence. In the following example, the number of base names that are returned by $topic.bn is determined by the number of base names that are defined in the Topic Map for the topic mouse:

  select $topic.bn where $topic.id = 'mouse';

Square brackets that contain a quoted string containing the value of a chosen item are used to specify that item out of the result sequence and can come after any path expression. For example:

  $topic.bn['central processing unit'] # the base name 'central
				       # processing unit' of
				       # the topic $topic.

Another example:

  >>select $topic, 
           $topic.id['cpu'], 
           $topic.bn['Central Processing Unit'], 
           $topic.bn['Centeral Processing Unit'].var['CPU'] 
     where exists $topic;

   $topic | $topic.id['cpu'] | $topic.bn['Central | $topic.bn['Central Processing
          |                  | Processing Unit']  | Unit'].var['CPU']
   -------+------------------+--------------------+-------------------------------
    cpu   | cpu              | Central Processing | CPU
          |                  | Unit               | 
   (1 row)

Another way to specify the items within a sequence is to use a variable within the square brackets:

  $topic.bn[$bn] # $bn will get the values of the sequence.
		 # we can use $bn in another place to limit
		 # the sequence.

This lets us control the sequences in a better way. For example we can select only basenames of topic foo starting with 'a':

  select $topic.bn[$bn] where $topic.id = 'foo' and $bn ~ '^a';

Square brackets also provide a way to access intermediate players in a long chain of associations. For example, the following statement gets all the possible paths between the topic stomach and the topic insulin through three associations, wherein the first is of type connect to.

  >>select 'stomach', 'connect_to', $p1, $at1, $p2, $at2, 'insulin'
          where id('stomach').$$<-(connect_to)->$$[$p1]
          .$$<-($at1)->$$[$p2]
          .$$<-($at2)->$$ = .insulin.;
  
  
  'stomach'|'connect|$p1     | $at1  | $p2    | $at2  |'insulin'
           |_to'    |        |       |        |       |
  ---------+--------+--------+-------+--------+-------+---------
   stomach |connect_|duodenum|connect|pancreas|produce| insulin
           |to      |        |_to    |        |       |
  (1 row)

Note the use of the anonymous variable for the roles (because we do not need the roles in any other place).

2.7 Round Brackets

Round brackets are used for indicating the level of instantiation or inheritance but also they are used as typing brackets, to indicate a type.

In addition, brackets can be used within a path expression in order to group expressions and to control precedence.

For example:

($association_class.instance(1))->($role_class.instance(1)).bn
The above returns the baseNames of a player in the association. The type of the associations is any instance of $association_class. The role of the player is any instance of $role_class.

2.7.1 Level

One can add a level to the path expressions for inheritance and instantiation. An overview is given in the following table.

Expression Input Output Result
.type(LEVEL) topic The topic that is the
type of the input at the
chosen levels
The topic id of the output
.instance(LEVEL) topic The topic that is the
instance of the input at
chosen levels
The topic id of the output
.super(LEVEL) topic The topic that is the
super-class of the input
at the chosen level(s)
The topic id of the output
.sub(LEVEL) subjectIdentityThe topic that is the
super-class of the input
at the chosen levels
The topic id of the input

For more explanation about the different path expressions displayed above, see Instantiation and Inheritance

The LEVEL parameter can be any non-negative number, an asterisk or a range. Level zero means the actual topic, level one means the type / instance / super / sub of the topic, level two indicates the type / instance / super / sub of the type / instance / super / sub of the topic and so on. An asterisk is used to refer to any level (including zero). A range, for example 1..*, is defined by a non-negative number, two dots and a greater number or an asterisk. Example:

  $topic.type(1)    # the direct type of $topic (equivalent to $topic.type)
  $topic.type(2)    # the type of the type of $topic
  $topic.type(*)    # the types at any level of $topic including $topic itself
  $topic.type(2..*) # the types at any level of $topic without the direct type and $topic itself
  $topic.type(0)    # gives $topic itself

Similar examples apply to the other expressions.

Another example:

  $topic.type(1).super(*) # any direct type of the topic, or any
                          # parent (through superclass-subclass
			  # association) of that direct type.				      

2.7.2 Type

Apart from using brackets to indicate the level of the inheritance and instantiation path expressions as explained above, brackets are also used to indicate a type. Brackets can follow several path expressions or precede the association arrow in the association expression. In such cases they allow to specify the types of the elements. Th following table gives an overview.

baseName .bn(type)
occurrence .oc(type)
association association_id(type)->role

The type itself can be specified as an expression that is resolved to a topic, that is, a topic literal (including a naked identifier), a variable containing a topic or a path expression that is resolved to a topic.

Examples:

  bn(abbreviation)       # the basename has to be of type abbreviation
  $a(part-whole)->part   # the association $a has to be of type 'part-whole'
  $topic.oc(description) # the occurrence has to be of type 'description'

Remark: Variants have no types according to the XTM standard, but have types according to the TMDM. We have decided not to include the types of variants in this version of Toma as their use was not clear.

2.8 The @ scope

The @ symbol is used for specifying the scope of a baseName, an occurrence or an association. It also allows to specify the parameters of a variant (which are described as scopes in XTM 1.1) and therefore also allows to specify the scope of an alias. The scope sign should be followed by an expression that is resolved to a topic and is written as shown in the following table:

baseName .bn@scope
.bn(type)@scope
variant .var@scope
alias .al@scope
occurrence .oc@scope
.oc(type)@scope
associationassociation_id(association_type)@scope->role

Note that the @ sign comes always right after the brackets that indicate type.

For Example:

  $topic.bn@en           # the base name of the topic in the English scope
  $topic.oc(size)@metric # the occurrence of type .size. in the metric scope

2.9 Precedence

Toma path expressions are evaluated from left to right.
For example in the expression $topic.oc(description)@en.rd, $topic is evaluated first. Then its occurrence of type description in the scope 'en' is evaluated. Finally the resource- Data of this occurrence is evaluated.

Expressions in brackets (including those in typing brackets) have higher precedence.
For example, in the expression $topic.oc($a.type)@en.rd, the type of the occurrence is evaluated as a whole (the type of the variable $a) before the occurrence is evaluated.

3. Toma statements

This chapter presents the statements that can be used in Toma.

In the notation presenting the syntax of the different statements, any clause surrounded by square brackets is optional. In addition, curly brackets are used to group possible options and a vertical bar symbol (“|”) is used as disjunction between those options.

3.1 The USE statement

The USE statement lets the user declare which Topic Map is to be queried or manipulated. The USE statement syntax is defined as follows:

 USE topic_map_path;
In this definition topic_map_path is a quoted string or a URI that indicates the path to the location where the Topic Map is kept.

For Example:

  use ’./db/computers.db’;
  use file:///usr/local/topiengine/db/computers.db;
The declaration that is done by the USE statement is applicable until another definition is done by another USE statement.

3.2 The SELECT statement

The SELECT statement is used in order to define queries over Topic Maps. The SELECT statement syntax is as follows:
  SELECT [ ALL | DISTINCT ] navigation_list
  [ WHERE formula ]
  [ { UNION | INTERSECT | EXCEPT } [ ALL ] other_select ]
  [ ORDER BY expr1 [ ASC | DESC ] [, expr2 [ ASC | DESC ] ...] ]
  [ LIMIT integer ]
  [ OFFSET integer ];
In general, variables can be introduced in the WHERE clause but also in the SELECT clause. In the next chapters we describe each of these clauses of the SELECT statement.

3.2.1 The SELECT clause

The SELECT clause of a query is used to define projections over the values of the variables found in the WHERE clause. The SELECT clause defines how rows in the result look like in a similar manner to the SELECT clause in SQL.

The navigation_list controls which values will be presented in the result set. It is possible to introduce new variables in the navigation list and to use any path expression.

The DISTINCT keyword will cause that no duplicated rows are returned.

The ALL keyword is the opposite of the DISTINCT keyword and is the default behavior.

The result is arranged as rows in a similar manner to the results that are returned from relational databases. Any returned variable will be represented by the id of its value as shown in the following example.

  select $topic, $topic.id where $topic.al = ’CPU’;
  
  $topic     | $topic.id
  -----------+----------
  processor  | processor
  (1 row)

Note that the SELECT clause returns elements of the values that are already chosen in the WHERE clause.

For example:

  select $topic.bn where $topic.bn = ’lung’;

   $topic.bn
  ----------
   lung
   long
  (2 rows)

In this example, we select all the topics that have the base name lung. Then we ask to show the base names of the topics we found. We find one topic that has the two base names lung and long. So the $topic.bn in the SELECT clause means that we want to see all the base names of the topic objects that can be $topic according to the WHERE clause.

This behavior is a useful feature. For example, if we need to “translate” base names between scopes we can write:

  select $topic.bn@dutch where $topic.bn@english = ’lung’;

  $topic.bn@dutch
  -----------------
  long
  (1 row)

The ability to include new variables in the selection part allows to generate one column which is totally dependent on the value of another column. For example, it is possible to retrieve the scope of the base name that is shown in the result set as follows:

  select $topic.bn@$scope, $scope.id where $topic.bn = ’lung’;

  $topic.bn  | $scope.id
  -----------+----------
  lung       | english
  long       | dutch
  (2 rows)

In this example, $topic is set in the WHERE clause, and then its base names are listed in the SELECT clause. However, the $scope variable is introduced in the selection clause as any scope of the listed base names of $topic. Therefore, $scope gets a value for each row in the result set.

3.2.2 The WHERE clause

The WHERE clause may contain the following sub-clauses:

These sub-clauses are explained in the next chapters.

3.2.2.1 The EXISTS sub-clause

The syntax for the EXISTS sub-clause is as follows:

  EXISTS path_expression
where path_expression is a Toma path expression.

If the result of evaluating the path expression is not empty, the sub-clause is evaluated to true.

Example:

    select $topic
      where exists $topic.si;
3.2.2.2 The comparison sub-clauses

A comparison sub-clause consists of two expressions and a comparison operator between them. The syntax is described as follows:

  expression1 = | != | ~ | ~* | !~ | !~* expression2
In this definition an expression might be any path expression, a variable or a quoted string.

When using the regular expression comparison operators, both expressions must be evaluated to a string.

A comparison operator can be one of the following:

= Equality operator. The two expressions around the equal sign should be equal to each other.
!= Inequality operator. Negation is used in Toma as filtering (unlike in SQL). Therefore, the inequality operator is used to filter out any equality between the two expressions.
~ Case sensitive regular expression match operator. The regular expressions that can be used are Perl-like regular expression. See Perl Compatible Regular Expressions (PCRE)[2] for details.
~* Case insensitive regular expression match operator.
!~ Negation of ~.
!~* Negation of ~*.
IS NULL Has the same meaning as NOT EXISTS.
IS NOT NULL Has the same meaning as EXISTS.

3.2.2.3 The NOT sub-clause

The NOT sub-clause consists of one search condition. Its syntax is described as follows:

  NOT formula
In the notation above formula can be any of the WHERE sub-clauses.

In Toma negation is used for filtering. Thus the negation adds constraints on the value of the variables in the WHERE clause.

3.2.2.4 The AND sub-clause

The AND sub-clause consists of two search conditions. Its syntax is described as follows:

  formula1 AND formula2
In order for the AND sub-clause to be evaluated to true, both formulas in the above notation should be evaluated to true - that is, the variables in those formulas should get values so that both formulas are true. If there are no such values, the AND sub-clause is evaluated to false.

3.2.2.5 The OR sub-clause

The OR sub-clause consists of two search conditions. Its syntax is described as follows:

  formula1 OR formula2
In order for the OR sub-clause to be evaluated to true, at least one of the formulas in the above notation should be evaluated to be true - that is the variables in the formulas will get all the possible values so that at least one of the formulas is true. If there are no such values, the OR sub-clause is evaluated to false.

3.2.2.6 The IN sub-clause

The IN sub-clause consists of an expression, the IN keyword and a list of comma separated expressions within brackets. Its syntax is described as follows:

  expression IN ( expression1, expression2 ... )

The IN sub-clause is evaluated as: ( expression = expression1 OR expression = expression2 OR ... )

Instead of the list of expressions you can also insert a sub select statement:

  expression IN (sub-select-statement)
The selection part of the sub select should be of exactly one expression. Also it is forbidden to use variables from the main SELECT statement in the sub select statement.

Example:

  # all the base names of the topics of type
  # ’mechanical device’ which have an occurrence of type
  # ’mass’ which is equivalent to one of the occurrences
  # of the same type of topics of type ’pc card’
  select $topic1.bn
    where $topic1.type.bn = ’mechanical device’
    and $topic1.oc(mass) in (select $topic2.oc(mass)
                               where $topic2.type.bn = ’pc card’);

3.2.3 The UNION clause

The syntax of the UNION clause is as follows:

 select1 UNION [ ALL ] select2

UNION appends the result set of the first SELECT statement (select1) to the result set of the second SELECT statement (select2). It also eliminates all duplicates (so it runs DISTINCT on the result) unless the ALL keyword is used.

In order for UNION to work, both selects must have similar selection clauses: they have to have the same number of expressions, and each expression in one select has to resolve to the same type (topic, association, base name etc.) of the expression in the other select in the same position.

todo example

3.2.4 The INTERSECT clause

The syntax of the INTERSECT sub-clause is as follows:

  select1 INTERSECT select2

INTERSECT returns all the results that are in both result sets.

Both selection clauses must be similar: they have to have the same number of expressions, and each expression in one select has to resolve to the same type (topic, association, base name etc.) of the expression in the other select in the same position.

3.2.5 The EXCEPT clause

The syntax of the EXCEPT sub-clause is as follows:

  select1 EXCEPT select2

EXCEPT returns all the results that are in the first SELECT statement (select1) but not in the second SELECT statement (select2).

Both selection clauses must be similar: they have to have the same number of expressions, and each expression in one select has to resolve to the same type (topic, association, base name etc.) of the expression in the other select in the same position.

3.2.6 The ORDER BY clause

The ORDER BY clause controls the way the result is ordered. The syntax of the ORDER BY sub-clause is as follows:

  ORDER BY column_number [ ASC | DESC | NASC | NDESC ] 
           [, column_number [ ASC | DESC | NASC | NDESC ] ...]
The list of the column numbers which follows the ORDER BY keywords defines ordering constraints over the variables used in the SELECT clause. The first column can be referred to as number one, the second, as number two etc.

Each one can be preceded by one of the keywords ASC (ascending), DESC (descending), NASC (numerical ascending) or NDESC (numerical descending). ASC is the default. ASC and DESC are for ordering alphabetically (so 10 comes before 2). NASC and NDESC are for ordering numerically. In that case, any value that is not a number is resolved to 0.

3.2.7 The LIMIT clause

The LIMIT clause provides a way to retrieve only a portion of the result. The syntax of the LIMIT sub-clause is as follows:
  LIMIT integer
In the above notation integer specifies the total number of rows to be retrieved.

3.2.8 The OFFSET clause

The OFFSET clause provides a way to retrieve only a portion of the result. The syntax of the OFFSET sub-clause is as follows:

  OFFSET integer
In the above notation integer controls the row to start from.

3.3 The INSERT statement

The INSERT statement is used in order to insert topics, topic elements and associations into the Topic Map. The INSERT statement syntax is as follows:

  INSERT value INTO simple_path_expression
     [ , value INTO simple_path_expression2 [ ... ] ];

The simple path expression is indeed simple because of the following reasons:

The simple path expression should be resolved to a not yet existing value (if the value already exists, the INSERT fails), and value is inserted as that value. The simple path expression can refer to other values that do not exist yet (such as not yet existing scopes or types). Those will spring into existence.

Pairs of value and simple path expression can be written in one INSERT separated by commas. Those pairs are inserted one after the other. For example, the INSERT:

  insert ’http://www.the.site.of.cpu.com/’
            into id(’cpu’).si.rr,
         ’CPU’
            into id(’cpu’).(bn@long)[’central processing unit’].var@short,
         ’The processor processes all the instructions.’
            into $topic.oc(description)@textual;
creates the topic:
<topic id=’cpu’>
  <instanceOf>
    <topicRef xlink:href="#processing-part"/>
  </instanceOf>
  <subjectIdentity>
    <resourceRef xlink:href="http://www.the.site.of.cpu.com/"/>
  </subjectIdentity>
  <baseName>
    <baseNameString>central processing unit</baseNameString>
    <scope>
      <topicRef xlink:href="#long"/>
    </scope>
    <variant>
      <parameters>
        <topicRef xlink:href="#short"/>
      </parameters>
      <resourceData>CPU</resourceData>
    </variant>
  </baseName>
  <occurrence>
    <resourceData>
      The processor is the device that processes all instructions.
    </resourceData>
    <instanceOf>
      <topicRef xlink:href="#description"/>
    </instanceOf>
    <scope>
      <topicRef xlink:href="#textual"/>
    </scope>
  </occurrence>
</topic>
In the example above, some elements were not explicitly mentioned to be inserted, although they are. For example, the topic cpu itself or its long base name and the scopes long and short. Those elements implicitly sprong into existence in order to be able to insert of the values that are explicitly inserted in the statement.

If the INSERT statement contains association path expressions, all of them are assumed to belong to the very same association. This allows to insert a new association by listing its players. Thus, the INSERT:

  insert ’adapter’
            into (provider-provided-receiver)->provider,
         ’laptop’
            into (provider-provided-receiver)->receiver,
         ’electricity220’
            into (provider-provided-receiver)->provided;
creates the association:
  <association id="_a1">
    <instanceOf>
      <topicRef xlink:href="#provider-provided-receiver"/>
    </instanceOf>
    <member id="_mem1">
      <roleSpec>
        <topicRef xlink:href="#provider"/>
      </roleSpec>
      <topicRef xlink:href="#adapter"/>
    </member>
    <member id="_mem2">
      <roleSpec>
        <topicRef xlink:href="#receiver"/>
      </roleSpec>
      <topicRef xlink:href="#laptop"/>
    </member>
    <member id="_mem3">
      <roleSpec>
        <topicRef xlink:href="#provided"/>
      </roleSpec>
      <topicRef xlink:href="#electricity220"/>
    </member>
  </association>

3.4 The UPDATE statement

The UPDATE statement is used in order to update the values of topic elements or associations in the Topic Map. The UPDATE statement syntax is as follows:

  UPDATE expression1 = string [ WHERE formula ];
where expression1 is a path expression and string is a quoted string.

In the UPDATE statement only one variable is allowed. For example, in the UPDATE statement below, we change the base name 'processor' to the base name 'Processor':

  update id(’processor’).bn[’processor’] = ’Processor’;

3.5 The DELETE statement

The DELETE statement is used to delete topics, topic elements or associations. Its syntax is as follow:

  DELETE expression WHERE [ formula ];

In the DELETE statement only one variable is allowed. In the following example, we delete all the occurrences of type mass in the scope textual of all the topics of type device:

  delete $topic.oc(mass).sc[’textual’]
    where $topic.type.id = ’device’;

3.6 The MERGE statement

The MERGE statement provides the ability to merge Topic Maps:

  MERGE [XTM] WITH uri
    [ MARK topic_literal1 [ , topic_literal2 ... ]];
  
  or:
  
  MERGE XTM <<content_separator
  content
  content_separator
    [ MARK topic_literal1 [ , topic_literal2 ... ]];

The currently used Topic Map (defined by the last USE statement) will be merged with the other Topic Map. The uri is the URI of a file which contains the Topic Map. This file can be other Topic Map storage, or any supported definition of Topic Map. If that file is not another Topic Map storage, the XTM keyword should be given.

If no uri is given, then a content block must be provided. When a content block is provided, note that a new line must follow the content separator. Note also that a content block cannot represent a Topic Map storage, and therefore the XTM keyword must be present.

If the MARK clause is used, the topics resolved from the topic literals will be added as scopes to all the characteristics of the merged Topic Map.

Example 1:

  use ’t/db/columbus-epds.db’;
  merge with file:://.db/columbus-msm.db
    mark columbus-msm;

Example 2:

  use columbus-epds;
  merge XTM <<EOF
    <topicMap id="only-mlu">
      <topic id="mlu">
        <instanceOf>
          <topicRef xlink:href="#device"/>
        </instanceOf>
        <baseName>
          <baseNameString>module lighting unit</baseNameString>
        </baseName>
      </topic>
    </topicMap>
  EOF;

3.7 The EXPORT statement

The EXPORT statement allows to export Topic Maps as a Topic Map representation (such as XTM).

  EXPORT [ TO file_path ] [ AS XTM ];
In the above notation file_path can be a quoted path or a URI. It refers to the location of the file to which the Topic Map is exported. If no file path is given, the EXPORT is written to the standard output.

The AS clause is optional and allows to define alternative formats to be written. XTM is the default format.

3.8 Locking statements

The locking statements are provided to be able to merge a Topic Map but prevent changes to certain parts of the Topic Map.

There are three locking statements as described in the following chapters.

3.8.1 The LOCK BY statement

The LOCK statement provides the ability to lock topics and associations of certain scopes. This feature is provided to be able to merge a Topic Map but prevent changes to the merged parts that are marked by a given scope. Its syntax is as follows:

  LOCK BY topic_literal;

More than one LOCK statement can be issued. Each locking scope will contribute to the set of locked topics and associations.

3.8.2 The UNLOCK statement

The UNLOCK statement provides the ability to unlock topics and associations of certain scopes. Its syntax is as follows:

  UNLOCK [ BY topic_literal ];
If no BY clause is provided, all the locks are removed.

3.8.3 The SHOW LOCKS statement

The SHOW LOCKS statement returns the scopes by which topics and associations are locked. Its syntax is as follows:

  SHOW LOCKS;
Each scope is returned in a row. The first column of the row is the id of the scope and the second row is its base name.

3.9 Constraint statements

There are six constraints statements in Toma in order to define, manipulate and check constraints, as described in the following chapters.

3.9.1 The DEFINE CONSTRAINT statement

The syntax of the DEFINE CONSTRAINT statement is as follows:

  DEFINE CONSTRAINT identifier
    EACH TOPIC | ASSOCIATION variable
      [ WHERE formula1 ]
      SATISFIES formula2;

Each constraint has a unique name - the identifier. A constraint cannot be redefined. In order to change a constraint, it has to be deleted first.

The two formula blocks (formula1 and formula2) are similar to the formula explained in the SELECT statement.

If the WHERE clause is omitted, the value of variable is all the topics or associations (according to the variable definition - TOPIC or ASSOCIATION in the beginning of the statement).

The constraint is broken when formula2 is false.

Example1:

  # each topic must have a base name
  define constraint basename_constraint
    each topic $topic
    satisfies exists $topic.bn;
Example 2:
  # each topic of type ’device’ must have occurrences of
  # type ’mass’ and ’description’ and it must play the
  # role ’host’ in a ’host-location’ association.
  define constraint device_constraint
    each topic $topic
      where $topic.type.id = ’device’
    satisfies exists $topic.oc(mass) and
              exists $topic.oc(description) and
              (host-location)->host = $topic;

3.9.2 Managing constraints statements

The DROP CONSTRAINT statement removes a constraint.

  DROP CONSTRAINT identifier;

Constraints can be disabled and re-enabled by using the statements

  DISABLE CONSTRAINT identifier;
and
  ENABLE CONSTRAINT identifier;

The SHOWCONSTRAINT statement is provided in order to list constraints:

  SHOW CONSTRAINT [ identifier ];
If no identifier is defined, all the constraints of the Topic Map are returned. Each constraint is returned in a row with three columns: the identifier of the constraint, the Toma definition of that constraint and “enabled” or “disabled” value. If identifier is defined, only that constraint is shown.

3.9.3 The CHECK CONSTRAINT statement

An enabled constraint can be checked by running the following statement:

  CHECK CONSTRAINT [ identifier ];
If identifier is not provided, all the constraints are checked.

For each broken constraint, the statement returns one or more rows which contains three columns: the identifier of the broken constraint, ’t’ or ’a’ to indicate that the constraint is broken by a topic or by an association, and the id of the topic or association that breaks the constraint.

4. Toma Functions

4.1 String Functions

The operator and functions in this section can be used only on strings.

4.1.1 The double pipe operator

The double pipe operator || concatenates two strings.

Example:

  select $topic
    were $topic.bn = 'cp' || 'u';

4.1.2 LOWERCASE( string )

This function converts all characters in the string to lower case characters.

Example:

  select $topic.oc(description), lowercase($topic.oc(description))
    where $topic.id = 'cpu';

   $topic.oc(description) | lowercase($topic.oc(description))
  ------------------------+----------------------------------
   The CPU is the brains  | the cpu is the brains
   of the computer.       | of the computer.
  (1 rows)

4.1.3 UPPERCASE( string )

This function converts all characters in the string to upper case characters.

Example:

  select $topic.oc(description), uppercase($topic.oc(description))
    where $topic.id = 'cpu';

   $topic.oc(description) | lowercase($topic.oc(description))
  ------------------------+----------------------------------
   The CPU is the brains  | THE CPU IS THE BRAINS
   of the computer.       | OF THE COMPUTER.
  (1 rows)

4.1.4 TITLECASE( string )

This function converts all characters to lower case except for the initial characters which are converted to upper case characters.

Example:

  select $topic.oc(description), titlecase($topic.oc(description))
    where $topic.id = 'cpu';

   $topic.oc(description) | titlecase($topic.oc(description))
  ------------------------+----------------------------------
   The CPU is the brains  | The cpu is the brains
   of the computer.       | of the computer.
  (1 rows)

4.1.5 LENGTH( string )

This function returns the length of a string.

Example:

  select $topic.oc(description), length($topic.oc(description))
    where $topic.id = 'cpu';

   $topic.oc(description) | length($topic.oc(description))
  ------------------------+----------------------------------
   The CPU is the brains  | 38
   of the computer.       | 
  (1 rows)

4.1.6 SUBSTR( string, from, [ length ] )

Provides the ability to retrieve a specific part of the string. It returns a sub-string of string starting from the from character (the first character is at index 1). If length is provided, the returned string will be of that length.

Example:

  select $topic.oc(description), substr($topic.oc(description),7,11)
    where $topic.id = 'cpu';

   $topic.oc(description) | substr($topic.oc(description),7,11)
  ------------------------+-------------------------------------
   The CPU is the brains  | U is the br
   of the computer.       | 
  (1 rows)

4.1.7 TRIM( string, [ LEADING | TRAILING | BOTH ], [ characters ] )

This function trims the string. It removes occurrences of any character from the start and/or end of string. If LEADING, TRAILING or BOTH are not provided, BOTH is taken as the default. If characters is not provided, space is taken as the default. If it is provided, space is automatically included and it is case sensitive and can also contain symbols (like a dot).

Example 1:

  select $topic.oc(description), trim($topic.oc(description), 'hrTc.e')
    where $topic.id = 'cpu';

   $topic.oc(description) | trim($topic.oc(description), 'hrTc.e')
  ------------------------+---------------------------------------
   The CPU is the brains  | PU is the brains
   of the computer.       | of the comput
  (1 rows)
Notice that the t in computer is not trimmed because it is not in titlecase and so it does not match T.

Example 2:

  select $topic.oc(description), trim($topic.oc(description),leading,'hrTc.e')
    where $topic.id = 'cpu';

   $topic.oc(description) | trim($topic.oc(description), leading, 'hrTc.e')
  ------------------------+------------------------------------------------
   The CPU is the brains  | PU is the brains
   of the computer.       | of the computer.
  (1 rows)

Example 3:

  select $topic.oc(description), trim($topic.oc(description), trailing, 'hrTc.e')
    where $topic.id = 'cpu';

   $topic.oc(description) | trim($topic.oc(description), trailing, 'hrTc.e')
  ------------------------+-------------------------------------------------
   The CPU is the brains  | The CPU is the brains
   of the computer.       | of the comput
  (1 rows)
Notice also here that the t in computer is not trimmed because it is not in titlecase and so it does not match T.

4.2 Conversions Functions

4.2.1 TO_NUM( text )

This function converts text to a number if possible. If not possible it will be converted to NULL.

Example:

  select $topic.oc(mass), to_num($topic.oc(mass))
    where $topic.id = 'computer';

   $topic.oc(mass) | to_num($topic.oc(mass))
  -----------------+---------------------------------------
   3.4 kg          | 3.4
  (1 rows)

4.2.2 TO_UNIT( text, target_unit )

This function convert between units. The function assumes that the text contains a number and a unit indicator. This function is implemented by using the Units Conversion Library by Mayo Foundation. List of all possible units can be found here.

Example:

  select $topic.oc(mass), to_unit($topic.oc(mass), 'pound')
    where $topic.id = 'computer';

   $topic.oc(mass) | to_unit($topic.oc(mass), 'pound')
  -----------------+---------------------------------------
   3.4 kg          | 7.49571641852906
  (1 rows)

4.3 Aggregation Functions

Aggregation functions can be used only in the selection clause of a SELECT statement. If an aggregation function is present in a selection clause, all the expressions of that selection clause must be aggregation functions (this is due to the fact that currently there is no grouping in Toma; if grouping turns out to be needed, it will be added to Toma in the future).

4.3.1 COUNT( expression )

This function counts the number of values in the result denoted by the expression.

Example:

  select count($topic.oc(description))
    where $topic = 'cpu';

4.3.2 SUM( expression )

This function sums the TO NUM conversions of the result set represented by the expression. If one of the values is converted to NULL, it is valuated as zero by the SUM function.

Example:

  select sum(to_num($topic.oc(mass))
    where $topic.type = 'device';
In this example the result will be the sum of all the masses of the occurrences of topics with type device.

4.3.3 MAX( expression

This function returns the maximum value among the TO NUM conversions of the result set represented by the expression. NULL is valuated as zero by the MAX function.

Example:

  select max(to_num($topic.oc(mass))
    where $topic.type = 'device';

Here the result will be the maximum of all the masses of the occurrences of topics with type device.

4.3.4 MIN( expression )

This function returns the minimum value among the TO NUM conversions of the result set represented by the expression. NULL is valuated as zero by the MIN function.

Example:

  select max(to_num($topic.oc(mass))
    where $topic.type = 'device';

Here the result will be the minimum of all the masses of the occurrences of topics with type device.

4.3.5 AVG ( expression )

This function calculates the average of values among the TO NUM conversions of the result set represented by the expression. NULL is valuated as zero by the AVG function.

Example:

  select avg(to_num($topic.oc(mass))
    where $topic.type = 'device';

Here the result will be the average of all the masses of the occurrences of topics with type device.

4.3.6 CONCAT ( expression, [ string ] )

This function concatenates the values of the result specified by the expression. If string is defined, it is placed as a separator between the values.

Example:

  select concat($topic, ', ')
    where $topic.type = 'device';
Here the result will be a comma-separated list of all the topics with type device.