A few days ago a user made a comment on my ColdFusion 8/CAPTCHA blog post. He reminded us (and it is a good reminder) that CAPTCHA has some serious accessibility issues. This got me thinking about converting the CAPTCHA image into spoken letters. I've seen a few sites that do this and, frankly, whether it helped with CAPTCHA or not I thought it would be cool to see if ColdFusion could generate speech. I did some digging and the primary library that folks seem to use in the Java world is FreeTTS (TTS is short for text to speech). There are probably many other alternatives out there, but that's the one I went with.
I began by downloading the compiled code for FreeTTS and confirmed the example application ran from the command line. I then began to dig into the docs a bit. I then began to cry a little bit as I realized that "documentation" was probably too strong of a word for what I found at the project. The API is fully documented. Examples do exist. But I couldn't find anything close to what I'd consider to be good documentation. (Full disclosure time. I will admit to not always providing great docs with my own projects!) Specifically it wasn't difficult to get code that would say something. I had that running with 15 minutes. What took forever was getting the audio saved to a file. The code that follows works, but please note that the code could probably be done better.
Once you've downloaded the FreeTTS code, extract it to your file system. All you really need are the JAR files from the lib folder. I loaded all the JARs using the wonderful, super-awesome, JavaLoader from Mark Mandel. I really hope dynamic class loading comes to ColdFusion 9 because it's so darn useful. Here is how I used it to suck in all the JARs from the lib folder:
<cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
<cfset jars = []>
<cfdirectory name="jarList" directory="#jardir#">
<cfloop query="jarList">
<cfset arrayAppend(jars, jardir & "/" & name)>
</cfloop>
<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
Now for the fun part. FreeTTS works by creating a voice and having the voice speak. So at a basic level, this code alone will work to create the speech.
<cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
<cfset vm = voiceManager.getInstance()>
<cfset voice = vm.getVoice("kevin16")>
<cfset voice.allocate()>
<cfset voice.speak("Hello World. This is a test of text to speech. It was a real pain in the ass. Really.")>
On my system this resulted in the words being spoken out of my laptop speakers. Did this surprise me. Heck yes. Did I scream like a little girl? I'm not telling. So as I said, this was relatively simple. Getting it to save to the file system though was a royal pain in the rear. Sure the code isn't that much different, it just took me forever to figure it out. The basic idea is to tell FreeTTS to use another audio player. FreeTTS has a 'player' called SingleFileAudionPlayer. As you can guess, this essentially turns a file into an audio player. In this version of the code, I set up the player and pass it to the voice. When run, it generates this wav file:
http://www.coldfusionjedi.com/images/test1.wav
I then switched the text to be something close to a CAPTHA. The result was a bit too quick to understand. Looking at the API, I saw that there was a WPM (words per minute) setting. You would think this would simply slow down the amount of words spoken per minute. Instead it simply slowed down the speech. So instead of hearing: "Something ..... something ....". It was more like "Sooooommmmmeeeething." I played with it a bit and got to be a bit slower, but, it's not perfect. Here is the final template I ended up with:
<cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
<cfset jars = []>
<cfdirectory name="jarList" directory="#jardir#">
<cfloop query="jarList">
<cfset arrayAppend(jars, jardir & "/" & name)>
</cfloop>
<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
<cfset audioFileFormatType = createObject("java", "javax.sound.sampled.AudioFileFormat$Type").init("WAVE","wav")>
<cfset sfAudio = loader.create("com.sun.speech.freetts.audio.SingleFileAudioPlayer").init("/Library/WebServer/Documents/tts/test",audioFileFormatType)>
<cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
<cfset vm = voiceManager.getInstance()>
<cfset voice = vm.getVoice("kevin16")>
<cfset lex = loader.create("com.sun.speech.freetts.en.us.CMULexicon").init()>
<cfset voice.setLexicon(lex)>
<cfset voice.setRate(110)>
<cfset voice.setAudioPlayer(sfAudio)>
<cfset voice.allocate()>
<cfset voice.speak("A 9 ## 2 L K 8 0")>
<cfset sfAudio.close()>
<p>
done
</p>
FreeTTS comes with more voices, and if I spent more time on it I could make it a bit nicer probably, but for now I'll stop and let folks comments. In the next blog entry, I'll show this in use with CAPTCHA.
As a reminder, in order for the template to work, you will need both JavaLoader and FreeTTS copied to your file system.
Archived Comments
on the mac from a terminal prompt: say ray is drop dead gorgeous
I played around with taking twitter rss piping it through say to speak tweets as they roll in.
This is totally badass @Ray! I've been working on an event notification system (remember all of the work I did to test the most efficient way to do polling via ajax?) - well it was so that I could figure out when an event happened and then "show" it to my users. Well now I'm going to include an option to "tell" it to my users using your awesome TTS mojo. This is frickin just plain cool. I've done this with fat client programs a number of years back but I had forgotten all about it and really didn't know how to go about implementing it on the web. Well done!
Hi Ray,
I tried to copy your example, but on my system (Linux Ubuntu with Apache and CF8) it won't work. It loads the java classes just fine, but the page hangs on the voice.allocate() line. It's just 'waiting' there. COuld this be because I am using Linux? Any ideas?
That's pretty darn cool, Ray: nice job! As Scott alluded to with his idea, this could have some pretty neat applications.
Erik: Yes. Because you use Linux, it will not work. Linux is evil because no one pays for it. ;)
Seriously - not sure. What I began with was the freetts.jar demo program. On the web site, they walk you through calling that at the command line. Can you try that and see if it works?
Erik, also try w/o the SingleFileAudioPlayer. In other words, simplify it.
Hmm, looks like Linux really IS evil. You know, linux is a poor man's mac!
I keep getting errors, even when I run the examples on the FreeTTS website. I already tried the mos 'simplified' version of the code, but with no luck... Can't run the FreeTTS.jar demo either. I will have to luck into it this weekend.
Erik-Jan
awesome!
If you ever get bored this is something kinda similar that can be fun to play with. http://www.jfugue.org/
@Erik, maybe a codec thing?
Just last night I was messing with the AT&T Text-to-speech demo (http://public.research.att.... app, for laffs.
With their TTS, it seems to work to have mutliple spaces in a phrase. Also, a period seems to make it pause a bit. Not sure if the FreeTTS uses similar logic, but you might get pauses with "A. B. C." or "A. B. C" insetad of "A B C".
In my coment above, I had a few spaces between the "A." "B." and "C.". I think BlogCFC decided to eat those extra spaces up. Yum! Anyhow, toss in a bunch of spaces in between the words to TTS. I bet that adds sufficient pause.
@Alan -- The spaces were delivered. Your browser ate them! :-)
Haha Like it Ray, cool. What about combining it with JW-player (Railo's integrated player)?
Alan - I tried periods and it worked great. I'll use it in my followup post showing CAPTCHA.
@Ray I've been talking to Mark Mandel about an issue I have had with this. It's caused by a memory leak using JavaLoader, due a URLClassLoader bug in coldfusion.
To avoid leaks you need to use server scope variables. More info is available on this link;
http://www.compoundtheory.c...
I'm still having momory leak issues currently, but working to resolve them.
Nice - thanks for sharing that Sam.
@Sam,
Try calling deallocate() at the very end. It is not in the API example (of course). But I noticed it in one of the demo examples. It closes files and releases resources, which seems to help the memory issue:
...
<cfset voice.speak("A 9 ## 2 L K 8 0")>
<cfset voice.deallocate()>
<cfset sfAudio.close()>
-Leigh
Hi Leigh,
Unfortunately <cfset voice.deallocate()> does not work, it throws an error.
Sam.
Maybe you have the wrong code? It worked perfectly for me when added to the code snippet you posted in the javaLoader group. (Note, I used the latest version of FreeTTS)
Thanks Leigh, voice.deallocate does work and is definitely needed to avoid your memory being chomped away.
I'm looking at adding audio to a CAPTCHA form. I can't seem to find the follow up to this article where I understand you (Ray) might have finished where you were going with this. Can you help? This would be a great help.
Thanks in advance.
Sorry - I never got around to actually doing that.
No worries - just making sure I wasn't missing something as I poked around the May and June posts.
This tasks is a side note to something I'm working on so I'm not sure I'll get through it, but if I do will post something here for what its worth.
Thanks again for sharing all your work/thoughts/experience etc.
I've made use of the API at ispeech.org. It would be a Flash solution, but would work well. (Not free though.)
Ouch, where's the volume control - lol. I'll take a look at that if my efforts don't pan out quickly enough AND this escalates to a requirement for this application.
:-)
Thanks again.
Where does the code above put the WAV file? I've gotten the page to produce the "done" statement so I'm assuming it created a WAV file, correct?
If I remember right, the init() statement was passed the path.
Well, I'm a little bit at a loss on this one, I still can't find where your original code is writing the WAV file. If I understand, correctly, where you left off with this, is that you stopped with the ability to pull from the TTS library the a set of characters that you (the the time) hard coded - which would subsequently be replaced with the CAPTCHA.
But if I see the "done" text, should I also hear the text or is there the WAV file somewhere on the server?
Thanks for your patience and your time.
Did you modify line 12 to specify a path on your system? Do you see a wave file there?
Yes and No.
:-(
Changed:
.init("/Library/WebServer/Documents/tts/test",audioFileFormatType)
to:
.init("/aud",audioFileFormatType)
but do not see a file created. Do not see anything recorded in the CF Administrator either.
You got me then - sorry. Is /aud a real path on your system? I'd maybe do something like /Users/foo/Desktop/test - I think the idea is you include a real path and the BEGINNING of a file (test), and the system then uses it to make test.wav.
Yeah, I saw that too, when I examined the code. It looked like test would be the file name since Line 11 defined the extension. I played with that and a couple other things, including adding a "." in to connect the extension (Line 11) to the file (Line 12). Even switching all the forward slashes to backward slashes didn't do anything.
I'll keep hacking away. Thanks again for your time - I'll post something should I get lucky!
:-)
@Derek,
It may be sending the file somewhere you are not expecting. For example, I used this on windows
.init("/dev",audioFileFormatType)>
And the file ended up as c:\dev.wav
-Leigh
@Leigh, Thanks for chiming in and WOW - there it is!!!! Very interesting - and its content is my last page load CAPTCHA. This is a step in the right direction, just a few other logistical things to maneuver but it looks promising.
That turns out to be a relative path (to the OS not the web site or CF). So to get the file where I wanted it, I was able to change "\aud" to "D:\website\aud\captcha" with success.
To round out my work, I'll substitute my application and session scope variables accordingly. The filename "captcha" will be replaced with the user's jsession ID and deleted when the form is posted - that way I do not have any collisions.
Thank for the help guys, this is looking good.
:-)
@Derek,
Yes, I think that is what Raymond was suggesting before ie substitute an absolute path (for whatever o/s you are using). But I am glad you got things sorted out.
-Leigh