Expectations of a type system

This forum is for any questions about the language Tutorial D or the Rel implementation of it.
Post Reply
Chris Walton
Posts: 76
Joined: Sat Aug 18, 2012 2:13 pm

Expectations of a type system

Post by Chris Walton »

The following is the note I put together when compiling Ex10 in the example forum. As I am having some difficulty in getting to grips with the type model in TTM/Tutorial D/Rel, I have listed some of the behaviour I expect from a type system. Some of this behaviour is present in Rel and some not.

This is not a full exposition of the nature of a type system, but rather describes some behaviour I expect of any type system I use.

Practical type system are often two tier systems consisting of base data types (BDTs) and user defined types (UDTs), defined in terms of the BDTs. UDTs form subtypes of BDTs, the subtypes being defined by constraint. BDT)s are defined as part of the type system. A type system should allow for expansion of types within a multi-tier system, to enable complex structures of UDTs to be defined in terms of other UDTs. It should also allow for the expansion of the system to allow multiple inheritance of data types. So far there is no need for multiple inheritance support.

A type system should provide facilities to define, implement and provide new data types and operators on these new data types for use by wider systems. These wider systems are defined (in part) by strict typing provided by the type system. While the type system should attempt to be complete, consistent, and, if possible, compatible with the principles set out in The Third Manifesto (TTM), there is no need for these aspects to be proven.

Every data item is associated with a data type, which can be a BDT or a UDT. A type defines the range of values permitted. A type is derived from a BDT, or another UDT, by constraint. The operators permitted on the type are defined separately from the data type definition, by means of a calling signature and a return value.. A data item of a particular type must not be used in a context where a different type is expected. To do so is considered to be a compile time error. The rule, "a data item of a particular type must not be used in a context where a different type is expected", is relaxed in that a base type can always be used in a context where a constrained version of that type is expected.

A type system must allow for different types, derived from different base types, to have the same name. It must also allow for there to be multiple name spaces, so that the same name can be defined in more than one name space (eg definition of identifier and sys.identifier).

Types and operators defined within the type system have the following behaviour:
If type A is defined as a supertype of a hierarchy of data types (it does not matter whether A is a UDT or a BDT).
B and C are subtypes of A
D is a subtype of B

In addition the following operators are defined (no return type is given, as it is irrelevant):
P ( A )
Q ( B )
R ( B )
R ( D )
S ( C )
T ( D )

then all the following are legitimate operator invocations in the language (and should be generated by the type system or language?):
P ( A ), P ( B ), P ( C ), P ( D )
Q ( B ), Q ( D )
R ( B ), R ( D ) - but gives different results to R ( B )
S ( C )
T ( D )

but the following are not legitimate and should generate an error in the language:
T ( C ), T ( B ), T ( A )
S ( B ), S ( A )
R ( C ), R ( A )
Q ( A )

The following operators in REL etc.; THE_, selector, TREAT_AS_, and IS_A are each defined at the appropriate level of the type hierarchy to give the following results:
legitimate:
THE_A ( A ), THE_A ( B ), THE_A ( C ), THE_A ( D ), THE_B ( B ), THE_B ( D ), THE_C ( C ), THE_D ( D )
A ( A ), A ( B ), A ( C ), A ( D ), B ( B ), B ( D ), C ( C ), D ( D )
TREAT_AS_A ( A ), TREAT_AS_A ( B ),, TREAT_AS_A ( C ),, TREAT_AS_A ( D ), TREAT_AS_B ( B ), TREAT_AS_B ( D ), TREAT_AS_C ( C ), TREAT_AS_D ( D )
and
IS_A ( A ) returns TRUE, IS_A ( B ) returns TRUE, IS_A ( C ) returns TRUE, IS_A ( D ) returns TRUE,
IS_B ( A ) returns FALSE, IS_B ( B ) returns TRUE, IS_B ( C ) returns FALSE, IS_B ( D ) returns TRUE,
IS_C ( A ) returns FALSE, IS_C ( B ) returns FALSE, IS_C ( C ) returns TRUE, IS_C ( D ) returns FALSE,
IS_D ( A ) returns FALSE, IS_D ( B ) returns FALSE, IS_D ( C ) returns FALSE, IS_D ( D ) returns TRUE,

illegitimate:
THE_B ( A ), THE_B ( C ), THE_C ( A ), THE_C ( B ), THE_C ( D ), THE_D ( A ), THE_D ( B ), THE_D ( C )
B ( A ), B ( C ), C ( A ), C ( B ), C ( D ), D ( A ), D ( B ), D ( C ),
TREAT_AS_B ( A ), TREAT_AS_B ( C ), TREAT_AS_C ( A ), TREAT_AS_C ( B ), TREAT_AS_C ( D ), TREAT_AS_D ( A ), TREAT_AS_D ( B ), TREAT_AS_D ( C )

The type system must not unduly constrain the definition of data types. In particular there must be a mechanism that specifies that a default value is required (and provided?) for a data type. It must also be possible for the type system to define types that require a particular collate sequence (and provide the means to define that sequence); and to define types that specify case sensitivity of their values.
Dave
Site Admin
Posts: 372
Joined: Sun Nov 27, 2005 7:19 pm

Re: Expectations of a type system

Post by Dave »

You wrote, "A type is derived from a BDT, or another UDT, by constraint." Indeed, and types can also be defined from existing types using structural inheritance or composition.

I'm not sure the distinction between BDTs and UDTs is particularly significant, as the former differs only from the latter by being present in the system on start-up and having an internal structure (or "POSSREP", in TTM parlance) that is not visible. Neither of these impact materially on how types are used, so I'm inclined to just use "type" for both and "built-in type" to distinguish BDTs in the rare cases where it is necessary to do so.

Questions about the rationale behind the Rel type system are best asked of Hugh Darwen and/or Chris Date, because Rel simply implements Tutorial D and endeavours to do so faithfully based on the descriptions in Date and Darwen's books, "Databases, Types, and the Relational Model (The Third Manifesto 3rd ed.)" and "Database Explorations: Essays on The Third Manifesto and Related Topics." The current version of Rel is compliant with "Databases, Types, and the Relational Model", but includes some "Database Explorations ..." extensions. The best way to communicate with Hugh Darwen is via the TTM mailing list, which you can join via http://thethirdmanifesto.com Hugh participates here on matters related to Rel in particular, but the TTM mailing list is an excellent place to discuss TTM and Tutorial D in general.

Some aspects of Tutorial D are left up to the implementer (such as what operators are automatically created, or not, by a TYPE definition) so there may be areas where Rel has violated either the spirit or the letter of TTM. I'll endeavour to respond to those where I can, and I'm certainly happy to answer specific implementation questions.

Default values for types are specified in "Database Explorations: Essays on The Third Manifesto and Related Topics" (as I recall -- I was hoping to insert an appropriate page citation but can't find it at the moment) but are not yet implemented in Rel for the simple reason that the Tutorial D grammar does not define any place -- unless I've missed it -- where such defaults could be used.

As for specification of collation sequences, this is entirely appropriate for an "Industrial D". As Rel is an implementation of Tutorial D -- which is intended primarily for learning and research purposes -- implementation of such is a fairly low priority. Similarly, a shortcut for defining case sensitivity of string comparisons is a low priority, though a subtype of CHAR could likely be defined to be case insensitive. I haven't tried it, but it sounds like a worthwhile project.
Chris Walton
Posts: 76
Joined: Sat Aug 18, 2012 2:13 pm

Re: Expectations of a type system

Post by Chris Walton »

I note what you say about composition - I have not yet explored that.I would certainly agree with you about the lack of distinction between UDTs and BDTs, except for one thing. In approaching the built in types in Rel in the way I did - unsure of the capabilities of Rel, and the implications in terms of syntax of the grammar, I found it easy to generate infinite loops in Rel. This seemed particularly acute when I was using the inbuilt data types. As I have learned a (little) more, this apparent fragility based on the BDTs has largely gone away.

I have restrained my comments to Rel rather than Tutorial D for a variety of reasons. I am trying to use Rel as the catalogue for the system I am putting together. I have found Rel to be the best fitted of all the languages I have come across for expressing the things I want to say, and to define - certainly much better than the wilds of SQL! As I am trying to use it, I do need an implementation of a language, rather than a specification for one. Also, as stated above, I have been (and still am) learning an unfamiliar syntax; some concepts that are not entirely clear to me; and some capabilities that it is difficult to see merely from a description of the grammar. All my queries, comments and notes must be seen in that light - I have tried to give examples to illustrate points I have had difficulty with; this note arose from such exploration etc.

I was not asking for collation series, nor for a case insensitive way of comparing strings. I do appreciate that these are both likely to be very low priority as far as you are concerned. It is possible that the latter capability may emerge from the work I am doing so you can probably afford to make it an even lower priority.
Post Reply