Generating Speech with ColdFusion and Java
A few days ago a user made a comment on my ColdFusion 8/CAPTCHA blog post. He reminded us (and it is a good reminder) that CAPTCHA has some serious accessibility issues. This got me thinking about converting the CAPTCHA image into spoken letters. I've seen a few sites that do this and, frankly, whether it helped with CAPTCHA or not I thought it would be cool to see if ColdFusion could generate speech. I did some digging and the primary library that folks seem to use in the Java world is FreeTTS (TTS is short for text to speech). There are probably many other alternatives out there, but that's the one I went with.
I began by downloading the compiled code for FreeTTS and confirmed the example application ran from the command line. I then began to dig into the docs a bit. I then began to cry a little bit as I realized that "documentation" was probably too strong of a word for what I found at the project. The API is fully documented. Examples do exist. But I couldn't find anything close to what I'd consider to be good documentation. (Full disclosure time. I will admit to not always providing great docs with my own projects!) Specifically it wasn't difficult to get code that would say something. I had that running with 15 minutes. What took forever was getting the audio saved to a file. The code that follows works, but please note that the code could probably be done better.
Once you've downloaded the FreeTTS code, extract it to your file system. All you really need are the JAR files from the lib folder. I loaded all the JARs using the wonderful, super-awesome, JavaLoader from Mark Mandel. I really hope dynamic class loading comes to ColdFusion 9 because it's so darn useful. Here is how I used it to suck in all the JARs from the lib folder:
2<cfset jars = []>
3<cfdirectory name="jarList" directory="#jardir#">
4<cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6</cfloop>
7
8<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
Now for the fun part. FreeTTS works by creating a voice and having the voice speak. So at a basic level, this code alone will work to create the speech.
2<cfset vm = voiceManager.getInstance()>
3<cfset voice = vm.getVoice("kevin16")>
4
5<cfset voice.allocate()>
6<cfset voice.speak("Hello World. This is a test of text to speech. It was a real pain in the ass. Really.")>
On my system this resulted in the words being spoken out of my laptop speakers. Did this surprise me. Heck yes. Did I scream like a little girl? I'm not telling. So as I said, this was relatively simple. Getting it to save to the file system though was a royal pain in the rear. Sure the code isn't that much different, it just took me forever to figure it out. The basic idea is to tell FreeTTS to use another audio player. FreeTTS has a 'player' called SingleFileAudionPlayer. As you can guess, this essentially turns a file into an audio player. In this version of the code, I set up the player and pass it to the voice. When run, it generates this wav file:
http://www.coldfusionjedi.com/images/test1.wav
I then switched the text to be something close to a CAPTHA. The result was a bit too quick to understand. Looking at the API, I saw that there was a WPM (words per minute) setting. You would think this would simply slow down the amount of words spoken per minute. Instead it simply slowed down the speech. So instead of hearing: "Something ..... something ....". It was more like "Sooooommmmmeeeething." I played with it a bit and got to be a bit slower, but, it's not perfect. Here is the final template I ended up with:
2<cfset jars = []>
3<cfdirectory name="jarList" directory="#jardir#">
4<cfloop query="jarList">
5 <cfset arrayAppend(jars, jardir & "/" & name)>
6</cfloop>
7
8<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
9
10
11<cfset audioFileFormatType = createObject("java", "javax.sound.sampled.AudioFileFormat$Type").init("WAVE","wav")>
12<cfset sfAudio = loader.create("com.sun.speech.freetts.audio.SingleFileAudioPlayer").init("/Library/WebServer/Documents/tts/test",audioFileFormatType)>
13
14
15<cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
16<cfset vm = voiceManager.getInstance()>
17<cfset voice = vm.getVoice("kevin16")>
18
19<cfset lex = loader.create("com.sun.speech.freetts.en.us.CMULexicon").init()>
20<cfset voice.setLexicon(lex)>
21<cfset voice.setRate(110)>
22<cfset voice.setAudioPlayer(sfAudio)>
23<cfset voice.allocate()>
24<cfset voice.speak("A 9 ## 2 L K 8 0")>
25<cfset sfAudio.close()>
26
27<p>
28done
29</p>
FreeTTS comes with more voices, and if I spent more time on it I could make it a bit nicer probably, but for now I'll stop and let folks comments. In the next blog entry, I'll show this in use with CAPTCHA.
As a reminder, in order for the template to work, you will need both JavaLoader and FreeTTS copied to your file system.

I played around with taking twitter rss piping it through say to speak tweets as they roll in.
I tried to copy your example, but on my system (Linux Ubuntu with Apache and CF8) it won't work. It loads the java classes just fine, but the page hangs on the voice.allocate() line. It's just 'waiting' there. COuld this be because I am using Linux? Any ideas?
Seriously - not sure. What I began with was the freetts.jar demo program. On the web site, they walk you through calling that at the command line. Can you try that and see if it works?
I keep getting errors, even when I run the examples on the FreeTTS website. I already tried the mos 'simplified' version of the code, but with no luck... Can't run the FreeTTS.jar demo either. I will have to luck into it this weekend.
Erik-Jan
If you ever get bored this is something kinda similar that can be fun to play with. http://www.jfugue.org/
@Erik, maybe a codec thing?
With their TTS, it seems to work to have mutliple spaces in a phrase. Also, a period seems to make it pause a bit. Not sure if the FreeTTS uses similar logic, but you might get pauses with "A. B. C." or "A. B. C" insetad of "A B C".
To avoid leaks you need to use server scope variables. More info is available on this link;
http://www.compoundtheory.com/?ID=212&action=d...
I'm still having momory leak issues currently, but working to resolve them.
Try calling deallocate() at the very end. It is not in the API example (of course). But I noticed it in one of the demo examples. It closes files and releases resources, which seems to help the memory issue:
...
<cfset voice.speak("A 9 ## 2 L K 8 0")>
<cfset voice.deallocate()>
<cfset sfAudio.close()>
-Leigh
Unfortunately <cfset voice.deallocate()> does not work, it throws an error.
Sam.
Thanks in advance.
This tasks is a side note to something I'm working on so I'm not sure I'll get through it, but if I do will post something here for what its worth.
Thanks again for sharing all your work/thoughts/experience etc.
:-)
Thanks again.
But if I see the "done" text, should I also hear the text or is there the WAV file somewhere on the server?
Thanks for your patience and your time.
:-(
Changed:
.init("/Library/WebServer/Documents/tts/test",audioFileFormatType)
to:
.init("/aud",audioFileFormatType)
but do not see a file created. Do not see anything recorded in the CF Administrator either.
I'll keep hacking away. Thanks again for your time - I'll post something should I get lucky!
:-)
It may be sending the file somewhere you are not expecting. For example, I used this on windows
.init("/dev",audioFileFormatType)>
And the file ended up as c:\dev.wav
-Leigh
That turns out to be a relative path (to the OS not the web site or CF). So to get the file where I wanted it, I was able to change "\aud" to "D:\website\aud\captcha" with success.
To round out my work, I'll substitute my application and session scope variables accordingly. The filename "captcha" will be replaced with the user's jsession ID and deleted when the form is posted - that way I do not have any collisions.
Thank for the help guys, this is looking good.
:-)
Yes, I think that is what Raymond was suggesting before ie substitute an absolute path (for whatever o/s you are using). But I am glad you got things sorted out.
-Leigh