A few days ago I saw on Twitter a request for code that would convert roman numerals to decimal. CFLib has a UDF for going from decimal to Roman, but not the other way. I did a bit of searching and while I found a bunch of code libraries, I didn't find one that explained the logic behind the translation. Finally I came across this page: Roman Numerals, which I thought explained the issue very nicely. The basic process to convert from Roman to decimal is:
-
Read the numbers from left to right.
-
Each number is added to the next...
-
Except when the next number is larger than the current number. Then you take the pair and do a subtraction.
So with this logic in mind, I came up with the following UDF. It assumes valid Roman numerals for input. But it seems to work ok.
romans["I"] = 1;
romans["V"] = 5;
romans["X"] = 10;
romans["L"] = 50;
romans["C"] = 100;
romans["D"] = 500;
romans["M"] = 1000; while(pos lte len(input)) {
char = mid(input, pos, 1);
//are we NOT at the end?
if(pos != len(input)) {
//check my next character - if bigger, replace with a sub
nextchar = mid(input, pos+1, 1);
if(romans[char] < romans[nextchar]) {
thisSum = romans[nextchar] - romans[char];
result += thisSum;
pos+=2;
} else {
result += romans[char];
pos++;
}
} else {
result += romans[char];
pos++;
}
} return result;
}
function romantodec(input) {
var romans = {};
var result = 0;
var pos = 1;
var char = "";
var thisSum = "";
var nextchar = "";
You can see how it follows the basic, 'left to right, add the numbers together' process, and how it notices when the current character has a higher number to the right of it. I wrote up a quick test script for it like so:
<cfset inputs = "XX,XI,IV,VIII,MC,DL,XL">
<cfloop index="input" list="#inputs#">
<cfoutput>
#input#=#romantodec(input)#<br/>
</cfoutput>
</cfloop>
Which produced:
XX=20
XI=11
IV=4
VIII=8
MC=1100
DL=550
XL=40
You can download this UDF at CFLib now: romanToDecimal
p.s. Sorry for those still waiting for UDF approval at CFLib. It is a volunteer process (myself, Scott Pinkston, Todd Sharp) so be patient!
Archived Comments
In Railo you can do :
#NumberFormat(1999, "roman")#
Which gives you:
MCMXCIX
Nice... but we're trying to go the _other_ way. ;)
DOH! Should have read it better!
The very first ColdFusion code I wrote was for a distribitor of supplies to fraternities and sororities. The greek thing was of course very big with this client. The code we implemented to do this has been lost many years ago and it made me appreciate the value of unique things like this being shared with others. CFLib is still a value and asset to developers in the know. Many nice features are built into CF but if they built everything in we would have no value. :)
That is very cool! cflib really does have a function for everything :)
I see one tiny little problem. IIX comes up as 10 instead of 8.
Doh! I'll work on it tomorrow.
Wait a sec - I don't think IIX is valid. That should be VIII instead.
It can be either way. Two smaller numbers are allowed to the left. Sush as IIC
I can see it with II, but not anything else. VVC should be XC.
We could write a rule that loops for IIN and simply replaces it with Val(N)-2.
As an FYI, I think you may be wrong. I went to two online converters, and both failed to grok IIX. They both had no problem with VIII.
Now I'm going to ask you to put up or shut up! ;) If you can find me proof that IIX (or IIC, etc) is valid, I'll support it. ;)
I will concide IIc is wrong for htis reason.
IIC is not even a valid Roman numeral (because you can't subtract 2 directly from 100; you would need to write it as XCIIX, for 10 less than 100, then 2 less than 10).
Also...
This form of notation closely follows Latin language usage, in which the number 18 is pronounced as duodeviginti, meaning two [deducted] from twenty (duo-de-viginti), and 19 is pronounced undeviginti, meaning one [deducted] from twenty (un-de-viginti).
So, if you can have 2 from 20, IIXX would be valid and come up wirth 18.
On a last note, it is clear that the rules are not really rules and have been changed over the last 2000 years. If IIX is not valid, at least, it shoud not retuen 10.
IIX isn't a valid roman numeral. You can use a regular expression ^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$ to validate the input. The expr. matches roman numerals up to MMMMDCCCLXXXVIII which is the longest you can write without extended syntax.
Good post BTW. Thank you for sharing.
IIX is a valid number and there is documented use of it.
http://en.wikipedia.org/wik...
@Andreas: I am sorry, but just because some REGEX code does not validate IIX does not invalidate IIX. Please quote some old Roman text, any old Roman text that states IIX is invalid and I will concede the point.
I don't know - this section "IIII and IV" seems to imply that _sometimes_ it was used. Considering that - practically - we have no "Roman Numerals Ruling Party", we have to make some concessions for standards and what we will accept.
If you can come up with a mod to the UDF to make it support XXY where X < Y, then I'll put it in. Otherwise, I can live with it. ;)
Those IIX could have been typos. We all make thme.
And I bet there was no stoneout to correct thoses erros that were 'written in stone.'
@Gary: Ok point taken. The regex once was created to match what *today* is considered to be a valid roman numeral, not the other way round. The wikipedia article also states that (usually) any symbol that appears more than once consecutively may not be followed by a symbol of larger value. That means, IIV would not be valid, although such numerals were rarely in the Middle Ages. Maybe some old Germans didn't know the rules well, and introduced some typos, only to see some future programmers pulling their hair out 800 years later ;-).
@Raymond +1 - seems as if all the converters out there use a similar approach.
Here is one way it can be done to allow for allowing IIX or any XXY where X<Y. I have not tested this fully but I believe the concept will work. This statement can just replace your if statement starting on line 23 going to line 30
if((pos + 2) < len(input) ){
nextchar2 = mid(input, pos+2, 1);
} else {//set nextchar2 to one will not allow anything to be smaller than it.
nextchar2 = 'I';
}
if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
result +=thisSum
pos+=2;
}else if(romans[char] < romans[nextchar]) {
thisSum = romans[nextchar] - romans[char];
result += thisSum;
pos+=2;
} else {
result += romans[char];
pos++;
}
sorry it was better formatted when I posted it don't know what happened.
Lets try this again sorry for the quick posts but I found a typo. should be pos+=3 here is the new code
if((pos + 2) < len(input) ){
nextchar2 = mid(input, pos+2, 1);
} else {//set nextchar2 to one will not allow anything to be smaller than it.
nextchar2 = 'I';
}
if(romans[char] == romans[nextchar] && romans[nexchar] < romans[nextchar2] ){
thisSum = romans[nextchar2] - romans[nextchar] - romans[char];
result +=thisSum
pos+=3;
}else if(romans[char] < romans[nextchar]) {
thisSum = romans[nextchar] - romans[char];
result += thisSum;
pos+=2;
} else {
result += romans[char];
pos++;
}
@Andreas: II really can't state with any authority what is or is not allowed. I know what I was taught in college, but then again it was a left-wing liberal school.
Either way, if IIX is not valid, it certainly should not return 10. It should return INVALID.
Little did I know my little post here would cause such controversy. ;)
@Gary I fully agree with you. It shouldn't return 10. Therefore I thought, the regex would be useful ...
@Ray: Many of your posts cause controversy. We love you for that. It gets us to think.
I find this post fascinating, not so much for the roman numeral aspect, but because you are parsing out "tokens" from the string that might be more than one character in length / meaning. This might seem random, but this is a concept that has me very interested as I think it applies, in general, to parsers.
Thanks Ben. My second language (not counting AppleSoft Basic) was Perl, so I've got a lot of love for string parsing. It can be incredibly fun - and frustrating.
Yeah, definitely frustrating. I think it's one of those things that is very easy conceptually; but then when you go to implement that concept, you realize it's a ridiculous amount of code.
This post makes me want to play with a very simple tokenizer. I don't know why, this is just really an interesting problem. Take the "comment" tag as an example. It is only meaningful in the following combination:
<!---
This means the parser has to read in 5 characters to build it... but it can't (say its an HTML comment, not a CFML one), then suddenly, it has to take the 4 preceding characters and treat them as individual tokens.
Maybe this is only interesting to me :)
Perhaps someone could create a "loose" standard converter and use the strict standard converter for the rest of us. Maybe someone could even write a isValidRomanNumber() that does strict testing before it gets put in. LOL ... this thread is pitiful.
Okay, I put up. I modified Ray's code to work with any Roman Number. Please don't laught too hard at the code. I know I am not the programmer Ray is.
http://www.jacfb.com/index....
Hmm, it keeps telling me my comment is spam.
Hey now - I hope I don't come off as a super programmer. I'm just a guy who likes to talk and write a lot. There are _plenty_ of programmers out there better than I. :)
That being said - your mod looks perfect! It works. But my ego forbids me from truly accepting that so I'm going to delete your comment and remove your BlogCFC from the Internet. Thanks for playing!
(No, instead, I'm going to update the CFLib version. Thanks!)
Hmm, I don't think anyone mentioned 'super' but hey, you are a Jedi.
This UDF works great in 9 but does not work in CF 6.1 (MX) - here is the error I get:
Invalid CFML construct found on line 33 at column 22.
ColdFusion was looking at the following text:
{
The CFML compiler was processing:
* a script statement beginning with "var" on line 33, column 9.
* a script statement beginning with "function" on line 32, column 1.
* a cfscript tag beginning on line 22, column 2.
The error occurred in D:\Inetpub\serv\roman.cfm: line 33
31 : */
32 : function romantodec(input) {
33 : var romans = {};
34 : var result = 0;
35 : var pos = 1;
CF6 doesn't support struct literals {}, just change to structNew().