Twitter: raymondcamden


Address: Lafayette, LA, USA

Translating from Roman to Decimal Numbers with ColdFusion

02-02-2010 4,809 views ColdFusion 35 Comments

A few days ago I saw on Twitter a request for code that would convert roman numerals to decimal. CFLib has a UDF for going from decimal to Roman, but not the other way. I did a bit of searching and while I found a bunch of code libraries, I didn't find one that explained the logic behind the translation. Finally I came across this page: Roman Numerals, which I thought explained the issue very nicely. The basic process to convert from Roman to decimal is:

1) Read the numbers from left to right.

2) Each number is added to the next...

3) Except when the next number is larger than the current number. Then you take the pair and do a subtraction.

So with this logic in mind, I came up with the following UDF. It assumes valid Roman numerals for input. But it seems to work ok.

view plain print about
1function romantodec(input) {
2    var romans = {};
3    var result = 0;
4    var pos = 1;
5    var char = "";
6    var thisSum = "";
7    var nextchar = "";
8        
9    romans["I"] = 1;
10    romans["V"] = 5;
11    romans["X"] = 10;
12    romans["L"] = 50;
13    romans["C"] = 100;
14    romans["D"] = 500;
15    romans["M"] = 1000;
16
17    while(pos lte len(input)) {
18        char = mid(input, pos, 1);
19        //are we NOT at the end?
20        if(pos != len(input)) {
21            //check my next character - if bigger, replace with a sub
22            nextchar = mid(input, pos+1, 1);
23            if(romans[char] < romans[nextchar]) {
24                thisSum = romans[nextchar] - romans[char];
25                result += thisSum;
26                pos+=2;
27            } else {
28                result += romans[char];
29                pos++;
30            }
31        } else {
32            result += romans[char];
33            pos++;
34        }
35    }
36    
37    return result;
38}

You can see how it follows the basic, 'left to right, add the numbers together' process, and how it notices when the current character has a higher number to the right of it. I wrote up a quick test script for it like so:

view plain print about
1<cfset inputs = "XX,XI,IV,VIII,MC,DL,XL">
2<cfloop index="input" list="#inputs#">
3    <cfoutput>
4    #input#=#romantodec(input)#<br/>
5    </cfoutput>
6</cfloop>

Which produced:

XX=20
XI=11
IV=4
VIII=8
MC=1100
DL=550
XL=40

You can download this UDF at CFLib now: romanToDecimal

p.s. Sorry for those still waiting for UDF approval at CFLib. It is a volunteer process (myself, Scott Pinkston, Todd Sharp) so be patient!

35 Comments

  • Commented on 02-02-2010 at 8:33 AM
    In Railo you can do :
    #NumberFormat(1999, "roman")#

    Which gives you:
    MCMXCIX
  • Commented on 02-02-2010 at 8:34 AM
    Nice... but we're trying to go the other way. ;)
  • Commented on 02-02-2010 at 8:49 AM
    DOH! Should have read it better!
  • Commented on 02-02-2010 at 9:43 AM
    The very first ColdFusion code I wrote was for a distribitor of supplies to fraternities and sororities. The greek thing was of course very big with this client. The code we implemented to do this has been lost many years ago and it made me appreciate the value of unique things like this being shared with others. CFLib is still a value and asset to developers in the know. Many nice features are built into CF but if they built everything in we would have no value. :)
  • Leigh #
    Commented on 02-02-2010 at 12:11 PM
    That is very cool! cflib really does have a function for everything :)
  • Commented on 02-02-2010 at 7:25 PM
    I see one tiny little problem. IIX comes up as 10 instead of 8.
  • Commented on 02-02-2010 at 10:53 PM
    Doh! I'll work on it tomorrow.
  • Commented on 02-03-2010 at 8:47 AM
    Wait a sec - I don't think IIX is valid. That should be VIII instead.
  • Commented on 02-03-2010 at 8:55 AM
    It can be either way. Two smaller numbers are allowed to the left. Sush as IIC
  • Commented on 02-03-2010 at 8:57 AM
    I can see it with II, but not anything else. VVC should be XC.

    We could write a rule that loops for IIN and simply replaces it with Val(N)-2.
  • Commented on 02-03-2010 at 8:59 AM
    As an FYI, I think you may be wrong. I went to two online converters, and both failed to grok IIX. They both had no problem with VIII.

    Now I'm going to ask you to put up or shut up! ;) If you can find me proof that IIX (or IIC, etc) is valid, I'll support it. ;)
  • Commented on 02-03-2010 at 9:15 AM
    I will concide IIc is wrong for htis reason.

    IIC is not even a valid Roman numeral (because you can't subtract 2 directly from 100; you would need to write it as XCIIX, for 10 less than 100, then 2 less than 10).

    Also...

    This form of notation closely follows Latin language usage, in which the number 18 is pronounced as duodeviginti, meaning two [deducted] from twenty (duo-de-viginti), and 19 is pronounced undeviginti, meaning one [deducted] from twenty (un-de-viginti).

    So, if you can have 2 from 20, IIXX would be valid and come up wirth 18.

    On a last note, it is clear that the rules are not really rules and have been changed over the last 2000 years. If IIX is not valid, at least, it shoud not retuen 10.
  • Commented on 02-03-2010 at 9:15 AM
    IIX isn't a valid roman numeral. You can use a regular expression ^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$ to validate the input. The expr. matches roman numerals up to MMMMDCCCLXXXVIII which is the longest you can write without extended syntax.

    Good post BTW. Thank you for sharing.
  • Commented on 02-03-2010 at 3:12 PM
    IIX is a valid number and there is documented use of it.

    http://en.wikipedia.org/wiki/Roman_numerals
  • Commented on 02-03-2010 at 3:16 PM
    @Andreas: I am sorry, but just because some REGEX code does not validate IIX does not invalidate IIX. Please quote some old Roman text, any old Roman text that states IIX is invalid and I will concede the point.
  • Commented on 02-03-2010 at 3:18 PM
    I don't know - this section "IIII and IV" seems to imply that sometimes it was used. Considering that - practically - we have no "Roman Numerals Ruling Party", we have to make some concessions for standards and what we will accept.

    If you can come up with a mod to the UDF to make it support XXY where X < Y, then I'll put it in. Otherwise, I can live with it. ;)
  • Commented on 02-03-2010 at 4:14 PM
    Those IIX could have been typos. We all make thme.
  • Commented on 02-03-2010 at 4:48 PM
    And I bet there was no stoneout to correct thoses erros that were 'written in stone.'
  • Commented on 02-04-2010 at 7:52 AM
    @Gary: Ok point taken. The regex once was created to match what today is considered to be a valid roman numeral, not the other way round. The wikipedia article also states that (usually) any symbol that appears more than once consecutively may not be followed by a symbol of larger value. That means, IIV would not be valid, although such numerals were rarely in the Middle Ages. Maybe some old Germans didn't know the rules well, and introduced some typos, only to see some future programmers pulling their hair out 800 years later ;-).

    @Raymond +1 - seems as if all the converters out there use a similar approach.
  • Daniel Harvey #
    Commented on 02-04-2010 at 8:51 AM
    Here is one way it can be done to allow for allowing IIX or any XXY where X<Y. I have not tested this fully but I believe the concept will work. This statement can just replace your if statement starting on line 23 going to line 30

              if((pos + 2) < len(input) ){
                nextchar2 = mid(input, pos+2, 1);
              } else {//set nextchar2 to one will not allow anything to be smaller than it.
                nextchar2 = 'I';
              }
              if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
                 thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
                result +=thisSum
                pos+=2;
              }else if(romans[char] < romans[nextchar]) {
    thisSum = romans[nextchar] - romans[char];
    result += thisSum;
    pos+=2;
    } else {
    result += romans[char];
    pos++;
    }
  • Daniel Harvey #
    Commented on 02-04-2010 at 8:52 AM
    sorry it was better formatted when I posted it don't know what happened.
  • Daniel Harvey #
    Commented on 02-04-2010 at 8:54 AM
    Lets try this again sorry for the quick posts but I found a typo. should be pos+=3 here is the new code



              if((pos + 2) < len(input) ){
                nextchar2 = mid(input, pos+2, 1);
              } else {//set nextchar2 to one will not allow anything to be smaller than it.
                nextchar2 = 'I';
              }
              if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
                 thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
                result +=thisSum
                pos+=3;
              }else if(romans[char] < romans[nextchar]) {
    thisSum = romans[nextchar] - romans[char];
    result += thisSum;
    pos+=2;
    } else {
    result += romans[char];
    pos++;
    }
  • Commented on 02-04-2010 at 10:44 AM
    @Andreas: II really can't state with any authority what is or is not allowed. I know what I was taught in college, but then again it was a left-wing liberal school.

    Either way, if IIX is not valid, it certainly should not return 10. It should return INVALID.
  • Commented on 02-04-2010 at 10:46 AM
    Little did I know my little post here would cause such controversy. ;)
  • Commented on 02-04-2010 at 10:58 AM
    @Gary I fully agree with you. It shouldn't return 10. Therefore I thought, the regex would be useful ...
  • Commented on 02-04-2010 at 11:07 AM
    @Ray: Many of your posts cause controversy. We love you for that. It gets us to think.
  • Commented on 02-04-2010 at 1:00 PM
    I find this post fascinating, not so much for the roman numeral aspect, but because you are parsing out "tokens" from the string that might be more than one character in length / meaning. This might seem random, but this is a concept that has me very interested as I think it applies, in general, to parsers.
  • Commented on 02-04-2010 at 1:27 PM
    Thanks Ben. My second language (not counting AppleSoft Basic) was Perl, so I've got a lot of love for string parsing. It can be incredibly fun - and frustrating.
  • Commented on 02-04-2010 at 1:34 PM
    Yeah, definitely frustrating. I think it's one of those things that is very easy conceptually; but then when you go to implement that concept, you realize it's a ridiculous amount of code.

    This post makes me want to play with a very simple tokenizer. I don't know why, this is just really an interesting problem. Take the "comment" tag as an example. It is only meaningful in the following combination:

    <!---

    This means the parser has to read in 5 characters to build it... but it can't (say its an HTML comment, not a CFML one), then suddenly, it has to take the 4 preceding characters and treat them as individual tokens.

    Maybe this is only interesting to me :)
  • Commented on 02-04-2010 at 2:09 PM
    Perhaps someone could create a "loose" standard converter and use the strict standard converter for the rest of us. Maybe someone could even write a isValidRomanNumber() that does strict testing before it gets put in. LOL ... this thread is pitiful.
  • Commented on 02-04-2010 at 5:46 PM
    Okay, I put up. I modified Ray's code to work with any Roman Number. Please don't laught too hard at the code. I know I am not the programmer Ray is.

    http://www.jacfb.com/index.cfm/2010/2/4/Translatin...

    Hmm, it keeps telling me my comment is spam.
  • Commented on 02-04-2010 at 6:12 PM
    Hey now - I hope I don't come off as a super programmer. I'm just a guy who likes to talk and write a lot. There are plenty of programmers out there better than I. :)

    That being said - your mod looks perfect! It works. But my ego forbids me from truly accepting that so I'm going to delete your comment and remove your BlogCFC from the Internet. Thanks for playing!

    (No, instead, I'm going to update the CFLib version. Thanks!)
  • Commented on 02-04-2010 at 7:06 PM
    Hmm, I don't think anyone mentioned 'super' but hey, you are a Jedi.
  • Charles #
    Commented on 02-25-2010 at 8:10 AM
    This UDF works great in 9 but does not work in CF 6.1 (MX) - here is the error I get:

    Invalid CFML construct found on line 33 at column 22.
    ColdFusion was looking at the following text:

    {

    The CFML compiler was processing:

    a script statement beginning with "var" on line 33, column 9.
    a script statement beginning with "function" on line 32, column 1.
    * a cfscript tag beginning on line 22, column 2.

    The error occurred in D:\Inetpub\serv\roman.cfm: line 33

    31 : */
    32 : function romantodec(input) {
    33 :    var romans = {};
    34 :    var result = 0;
    35 :    var pos = 1;
  • Commented on 02-25-2010 at 10:21 AM
    CF6 doesn't support struct literals {}, just change to structNew().

Post Reply

Please refrain from posting large blocks of code as a comment. Use Pastebin or Gists instead. Text wrapped in asterisks (*) will be bold and text wrapped in underscores (_) will be italicized.

Leave this field empty