Testing strings for numeric

This forum is for discussing the development of Rel examples and sample applications.
Post Reply
HughDarwen
Posts:121
Joined:Sat May 24, 2008 4:49 pm
Testing strings for numeric

Post by HughDarwen » Wed Aug 20, 2008 3:09 pm

When we get UDT support, teachers using Chris Date's "suppliers and parts" database would like to define types such as SNO (for supplier numbers of the form "Sn", where n is a sequence of digits).

With this in mind I essayed the following Tutorial D operator definition:

operator is_num (s CHAR ) returns BOOLEAN ;
begin ;
var i integer init(0);
var b BOOLEAN INIT(TRUE);
if s = '' then return false; end if;
while i < LENGTH(s);
begin;
if IS_EMPTY ( RELATION{ TUPLE { c SUBSTRING(s,i,i+1) } } JOIN
RELATION{
TUPLE { c '1' },
TUPLE { c '2' },
TUPLE { c '3' },
TUPLE { c '4' },
TUPLE { c '5' },
TUPLE { c '6' },
TUPLE { c '7' },
TUPLE { c '8' },
TUPLE { c '9' }
} )
then b := FALSE ;
END IF ;
i := i + 1 ;
END ;
END WHILE;
RETURN b ;
END;
END OPERATOR ;

This works fine but takes for ages--try is_num('123')! This probably doesn't matter much, but if a FOREIGN Java implementation would be another easy one-liner, then it might be worth including with the set of string operators.

Hugh

Dave
Site Admin
Posts:368
Joined:Sun Nov 27, 2005 7:19 pm

Re: Testing strings for numeric

Post by Dave » Thu Aug 21, 2008 2:08 pm

I'm not surprised your is_num operator is slow. The changes I made to improve the performance of JOINs on relatively high cardinality relations have caused a loss of performance on low cardinality relations. It's something I'll have to address in the future.

In the mean time, I have added the following operator to OperatorsChar.d:

Code: Select all

OPERATOR IS_NUMERIC(s CHARACTER) RETURNS BOOLEAN Java FOREIGN
  try {
     Long.parseLong(s.stringValue());
     return ValueBoolean.getTrue();
  } catch (java.lang.NumberFormatException nfe) {
     return ValueBoolean.getFalse();
  }
END OPERATOR;

Dave
Site Admin
Posts:368
Joined:Sun Nov 27, 2005 7:19 pm

Re: Testing strings for numeric

Post by Dave » Sat Aug 23, 2008 12:15 am

Hugh,

I just tried your is_num() operator on a Linux machine and a Windows XP machine, and it wasn't noticeably slow. How long would you estimate it takes to evaluate is_num('123') on your system?

HughDarwen
Posts:121
Joined:Sat May 24, 2008 4:49 pm

Re: Testing strings for numeric

Post by HughDarwen » Thu Apr 30, 2009 10:33 am

Dave,

Sorry I never replied to your question about the performance of my attempted is_num operator. Maybe I'll get back to that one day but in the meantime I've been playing with IS_NUMERIC. I was worried that my definition of type SID (student identifiers consisting of the letter 'S' followed by digits) would allow things like SID('S-5'), which indeed it does. Well, I can hardly complain, as of course the string '-5' does represent a number. However, there seem to be some inconsistencies:

IS_NUMERIC('-5') = TRUE
IS_NUMERIC('5.5') = FALSE
IS_NUMERIC('+5') = FALSE

I hesitate to call this a bug but if it's working as designed, then I'm wondering about the rationale.

Of course I can fix my type constraint by including SUBSTRING(s,1,2) <> '-' in it, but will you make any change to IS_NUMERIC in the light of my findings?

Regards,
Hugh

Dave
Site Admin
Posts:368
Joined:Sun Nov 27, 2005 7:19 pm

Re: Testing strings for numeric

Post by Dave » Sun May 03, 2009 6:01 pm

Indeed, it's working as designed, to the extent that my quick hack can be considered "designed". As implemented, IS_NUMERIC hands responsibility to a Java library function that attempts to convert the input to a signed long. If it succeeds, the value is considered numeric. On that basis, I would have expected IS_NUMERIC('+5') to return TRUE, but obviously not.

In light of the above, and the fact that the interpretation of "numeric" can be ambiguous (e.g., what if we're dealing with base-2 numbers, or base-8, or base-16? Or decimal numbers or not? Etc.), I'll remove IS_NUMERIC and replace it with IS_DIGITS, which will only return TRUE if the string consists strictly of the digits 0-9.

Here is IS_DIGITS:

Code: Select all

OPERATOR IS_DIGITS(s CHARACTER) RETURNS BOOLEAN Java FOREIGN
	String sbuf = s.stringValue();
	for (int i=0; i<sbuf.length(); i++)
		if (!Character.isDigit(sbuf.charAt(i)))
			return ValueBoolean.getFalse();
	return ValueBoolean.getTrue();
END OPERATOR;

Post Reply