When we get UDT support, teachers using Chris Date's "suppliers and parts" database would like to define types such as SNO (for supplier numbers of the form "Sn", where n is a sequence of digits).
With this in mind I essayed the following Tutorial D operator definition:
operator is_num (s CHAR ) returns BOOLEAN ;
begin ;
var i integer init(0);
var b BOOLEAN INIT(TRUE);
if s = '' then return false; end if;
while i < LENGTH(s);
begin;
if IS_EMPTY ( RELATION{ TUPLE { c SUBSTRING(s,i,i+1) } } JOIN
RELATION{
TUPLE { c '1' },
TUPLE { c '2' },
TUPLE { c '3' },
TUPLE { c '4' },
TUPLE { c '5' },
TUPLE { c '6' },
TUPLE { c '7' },
TUPLE { c '8' },
TUPLE { c '9' }
} )
then b := FALSE ;
END IF ;
i := i + 1 ;
END ;
END WHILE;
RETURN b ;
END;
END OPERATOR ;
This works fine but takes for ages--try is_num('123')! This probably doesn't matter much, but if a FOREIGN Java implementation would be another easy one-liner, then it might be worth including with the set of string operators.
Hugh
Testing strings for numeric
Re: Testing strings for numeric
I'm not surprised your is_num operator is slow. The changes I made to improve the performance of JOINs on relatively high cardinality relations have caused a loss of performance on low cardinality relations. It's something I'll have to address in the future.
In the mean time, I have added the following operator to OperatorsChar.d:
In the mean time, I have added the following operator to OperatorsChar.d:
Code: Select all
OPERATOR IS_NUMERIC(s CHARACTER) RETURNS BOOLEAN Java FOREIGN
try {
Long.parseLong(s.stringValue());
return ValueBoolean.getTrue();
} catch (java.lang.NumberFormatException nfe) {
return ValueBoolean.getFalse();
}
END OPERATOR;
Re: Testing strings for numeric
Hugh,
I just tried your is_num() operator on a Linux machine and a Windows XP machine, and it wasn't noticeably slow. How long would you estimate it takes to evaluate is_num('123') on your system?
I just tried your is_num() operator on a Linux machine and a Windows XP machine, and it wasn't noticeably slow. How long would you estimate it takes to evaluate is_num('123') on your system?
-
- Posts: 124
- Joined: Sat May 24, 2008 4:49 pm
Re: Testing strings for numeric
Dave,
Sorry I never replied to your question about the performance of my attempted is_num operator. Maybe I'll get back to that one day but in the meantime I've been playing with IS_NUMERIC. I was worried that my definition of type SID (student identifiers consisting of the letter 'S' followed by digits) would allow things like SID('S-5'), which indeed it does. Well, I can hardly complain, as of course the string '-5' does represent a number. However, there seem to be some inconsistencies:
IS_NUMERIC('-5') = TRUE
IS_NUMERIC('5.5') = FALSE
IS_NUMERIC('+5') = FALSE
I hesitate to call this a bug but if it's working as designed, then I'm wondering about the rationale.
Of course I can fix my type constraint by including SUBSTRING(s,1,2) <> '-' in it, but will you make any change to IS_NUMERIC in the light of my findings?
Regards,
Hugh
Sorry I never replied to your question about the performance of my attempted is_num operator. Maybe I'll get back to that one day but in the meantime I've been playing with IS_NUMERIC. I was worried that my definition of type SID (student identifiers consisting of the letter 'S' followed by digits) would allow things like SID('S-5'), which indeed it does. Well, I can hardly complain, as of course the string '-5' does represent a number. However, there seem to be some inconsistencies:
IS_NUMERIC('-5') = TRUE
IS_NUMERIC('5.5') = FALSE
IS_NUMERIC('+5') = FALSE
I hesitate to call this a bug but if it's working as designed, then I'm wondering about the rationale.
Of course I can fix my type constraint by including SUBSTRING(s,1,2) <> '-' in it, but will you make any change to IS_NUMERIC in the light of my findings?
Regards,
Hugh
Re: Testing strings for numeric
Indeed, it's working as designed, to the extent that my quick hack can be considered "designed". As implemented, IS_NUMERIC hands responsibility to a Java library function that attempts to convert the input to a signed long. If it succeeds, the value is considered numeric. On that basis, I would have expected IS_NUMERIC('+5') to return TRUE, but obviously not.
In light of the above, and the fact that the interpretation of "numeric" can be ambiguous (e.g., what if we're dealing with base-2 numbers, or base-8, or base-16? Or decimal numbers or not? Etc.), I'll remove IS_NUMERIC and replace it with IS_DIGITS, which will only return TRUE if the string consists strictly of the digits 0-9.
Here is IS_DIGITS:
In light of the above, and the fact that the interpretation of "numeric" can be ambiguous (e.g., what if we're dealing with base-2 numbers, or base-8, or base-16? Or decimal numbers or not? Etc.), I'll remove IS_NUMERIC and replace it with IS_DIGITS, which will only return TRUE if the string consists strictly of the digits 0-9.
Here is IS_DIGITS:
Code: Select all
OPERATOR IS_DIGITS(s CHARACTER) RETURNS BOOLEAN Java FOREIGN
String sbuf = s.stringValue();
for (int i=0; i<sbuf.length(); i++)
if (!Character.isDigit(sbuf.charAt(i)))
return ValueBoolean.getFalse();
return ValueBoolean.getTrue();
END OPERATOR;