Twitter: raymondcamden


Address: Lafayette, LA, USA

Generating Speech with ColdFusion and Java

05-29-2009 8,965 views ColdFusion 34 Comments

A few days ago a user made a comment on my ColdFusion 8/CAPTCHA blog post. He reminded us (and it is a good reminder) that CAPTCHA has some serious accessibility issues. This got me thinking about converting the CAPTCHA image into spoken letters. I've seen a few sites that do this and, frankly, whether it helped with CAPTCHA or not I thought it would be cool to see if ColdFusion could generate speech. I did some digging and the primary library that folks seem to use in the Java world is FreeTTS (TTS is short for text to speech). There are probably many other alternatives out there, but that's the one I went with.

I began by downloading the compiled code for FreeTTS and confirmed the example application ran from the command line. I then began to dig into the docs a bit. I then began to cry a little bit as I realized that "documentation" was probably too strong of a word for what I found at the project. The API is fully documented. Examples do exist. But I couldn't find anything close to what I'd consider to be good documentation. (Full disclosure time. I will admit to not always providing great docs with my own projects!) Specifically it wasn't difficult to get code that would say something. I had that running with 15 minutes. What took forever was getting the audio saved to a file. The code that follows works, but please note that the code could probably be done better.

Once you've downloaded the FreeTTS code, extract it to your file system. All you really need are the JAR files from the lib folder. I loaded all the JARs using the wonderful, super-awesome, JavaLoader from Mark Mandel. I really hope dynamic class loading comes to ColdFusion 9 because it's so darn useful. Here is how I used it to suck in all the JARs from the lib folder:

view plain print about
1<cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
2<cfset jars = []>
3<cfdirectory name="jarList" directory="#jardir#">
4<cfloop query="jarList">
5    <cfset arrayAppend(jars, jardir & "/" & name)>
6</cfloop>
7
8<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>

Now for the fun part. FreeTTS works by creating a voice and having the voice speak. So at a basic level, this code alone will work to create the speech.

view plain print about
1<cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
2<cfset vm = voiceManager.getInstance()>
3<cfset voice = vm.getVoice("kevin16")>
4
5<cfset voice.allocate()>
6<cfset voice.speak("Hello World. This is a test of text to speech. It was a real pain in the ass. Really.")>

On my system this resulted in the words being spoken out of my laptop speakers. Did this surprise me. Heck yes. Did I scream like a little girl? I'm not telling. So as I said, this was relatively simple. Getting it to save to the file system though was a royal pain in the rear. Sure the code isn't that much different, it just took me forever to figure it out. The basic idea is to tell FreeTTS to use another audio player. FreeTTS has a 'player' called SingleFileAudionPlayer. As you can guess, this essentially turns a file into an audio player. In this version of the code, I set up the player and pass it to the voice. When run, it generates this wav file:

http://www.coldfusionjedi.com/images/test1.wav

I then switched the text to be something close to a CAPTHA. The result was a bit too quick to understand. Looking at the API, I saw that there was a WPM (words per minute) setting. You would think this would simply slow down the amount of words spoken per minute. Instead it simply slowed down the speech. So instead of hearing: "Something ..... something ....". It was more like "Sooooommmmmeeeething." I played with it a bit and got to be a bit slower, but, it's not perfect. Here is the final template I ended up with:

view plain print about
1<cfset jardir = expandPath("./freetts-1.2.2-bin/freetts-1.2/lib")>
2<cfset jars = []>
3<cfdirectory name="jarList" directory="#jardir#">
4<cfloop query="jarList">
5    <cfset arrayAppend(jars, jardir & "/" & name)>
6</cfloop>
7
8<cfset loader = createObject("component", "javaloader.JavaLoader").init(jars)>
9
10
11<cfset audioFileFormatType = createObject("java", "javax.sound.sampled.AudioFileFormat$Type").init("WAVE","wav")>
12<cfset sfAudio = loader.create("com.sun.speech.freetts.audio.SingleFileAudioPlayer").init("/Library/WebServer/Documents/tts/test",audioFileFormatType)>
13
14
15<cfset voiceManager = loader.create("com.sun.speech.freetts.VoiceManager")>
16<cfset vm = voiceManager.getInstance()>
17<cfset voice = vm.getVoice("kevin16")>
18
19<cfset lex = loader.create("com.sun.speech.freetts.en.us.CMULexicon").init()>
20<cfset voice.setLexicon(lex)>
21<cfset voice.setRate(110)>
22<cfset voice.setAudioPlayer(sfAudio)>
23<cfset voice.allocate()>
24<cfset voice.speak("A 9 ## 2 L K 8 0")>
25<cfset sfAudio.close()>
26
27<p>
28done
29</p>

FreeTTS comes with more voices, and if I spent more time on it I could make it a bit nicer probably, but for now I'll stop and let folks comments. In the next blog entry, I'll show this in use with CAPTCHA.

As a reminder, in order for the template to work, you will need both JavaLoader and FreeTTS copied to your file system.

34 Comments

  • Commented on 05-29-2009 at 12:22 AM
    on the mac from a terminal prompt: say ray is drop dead gorgeous

    I played around with taking twitter rss piping it through say to speak tweets as they roll in.
  • Commented on 05-29-2009 at 12:35 AM
    This is totally badass @Ray! I've been working on an event notification system (remember all of the work I did to test the most efficient way to do polling via ajax?) - well it was so that I could figure out when an event happened and then "show" it to my users. Well now I'm going to include an option to "tell" it to my users using your awesome TTS mojo. This is frickin just plain cool. I've done this with fat client programs a number of years back but I had forgotten all about it and really didn't know how to go about implementing it on the web. Well done!
  • Commented on 05-29-2009 at 2:30 AM
    Hi Ray,

    I tried to copy your example, but on my system (Linux Ubuntu with Apache and CF8) it won't work. It loads the java classes just fine, but the page hangs on the voice.allocate() line. It's just 'waiting' there. COuld this be because I am using Linux? Any ideas?
  • Commented on 05-29-2009 at 6:49 AM
    That's pretty darn cool, Ray: nice job! As Scott alluded to with his idea, this could have some pretty neat applications.
  • Commented on 05-29-2009 at 7:39 AM
    Erik: Yes. Because you use Linux, it will not work. Linux is evil because no one pays for it. ;)

    Seriously - not sure. What I began with was the freetts.jar demo program. On the web site, they walk you through calling that at the command line. Can you try that and see if it works?
  • Commented on 05-29-2009 at 7:41 AM
    Erik, also try w/o the SingleFileAudioPlayer. In other words, simplify it.
  • Commented on 05-29-2009 at 9:52 AM
    Hmm, looks like Linux really IS evil. You know, linux is a poor man's mac!

    I keep getting errors, even when I run the examples on the FreeTTS website. I already tried the mos 'simplified' version of the code, but with no luck... Can't run the FreeTTS.jar demo either. I will have to luck into it this weekend.

    Erik-Jan
  • Garrett Johnson #
    Commented on 05-29-2009 at 10:33 AM
    awesome!

    If you ever get bored this is something kinda similar that can be fun to play with. http://www.jfugue.org/

    @Erik, maybe a codec thing?
  • Alan McCollough #
    Commented on 05-29-2009 at 11:55 AM
    Just last night I was messing with the AT&T Text-to-speech demo (http://public.research.att.com/~ttsweb/tts/demo.ph...) app, for laffs.

    With their TTS, it seems to work to have mutliple spaces in a phrase. Also, a period seems to make it pause a bit. Not sure if the FreeTTS uses similar logic, but you might get pauses with "A. B. C." or "A. B. C" insetad of "A B C".
  • Alan McCollough #
    Commented on 05-29-2009 at 11:57 AM
    In my coment above, I had a few spaces between the "A." "B." and "C.". I think BlogCFC decided to eat those extra spaces up. Yum! Anyhow, toss in a bunch of spaces in between the words to TTS. I bet that adds sufficient pause.
  • Ben #
    Commented on 05-29-2009 at 3:21 PM
    @Alan -- The spaces were delivered. Your browser ate them! :-)
  • Commented on 05-29-2009 at 4:06 PM
    Haha Like it Ray, cool. What about combining it with JW-player (Railo's integrated player)?
  • Commented on 05-30-2009 at 8:31 PM
    Alan - I tried periods and it worked great. I'll use it in my followup post showing CAPTCHA.
  • Sam #
    Commented on 11-16-2009 at 9:50 AM
    @Ray I've been talking to Mark Mandel about an issue I have had with this. It's caused by a memory leak using JavaLoader, due a URLClassLoader bug in coldfusion.

    To avoid leaks you need to use server scope variables. More info is available on this link;
    http://www.compoundtheory.com/?ID=212&action=d...

    I'm still having momory leak issues currently, but working to resolve them.
  • Commented on 11-16-2009 at 9:54 AM
    Nice - thanks for sharing that Sam.
  • Leigh #
    Commented on 11-16-2009 at 10:34 AM
    @Sam,

    Try calling deallocate() at the very end. It is not in the API example (of course). But I noticed it in one of the demo examples. It closes files and releases resources, which seems to help the memory issue:

    ...
    <cfset voice.speak("A 9 ## 2 L K 8 0")>
    <cfset voice.deallocate()>
    <cfset sfAudio.close()>

    -Leigh
  • Sam #
    Commented on 11-17-2009 at 9:33 AM
    Hi Leigh,

    Unfortunately <cfset voice.deallocate()> does not work, it throws an error.

    Sam.
  • Leigh #
    Commented on 11-17-2009 at 9:44 AM
    Maybe you have the wrong code? It worked perfectly for me when added to the code snippet you posted in the javaLoader group. (Note, I used the latest version of FreeTTS)
  • Sam #
    Commented on 12-22-2009 at 4:30 AM
    Thanks Leigh, voice.deallocate does work and is definitely needed to avoid your memory being chomped away.
  • Derek #
    Commented on 08-02-2010 at 7:55 AM
    I'm looking at adding audio to a CAPTCHA form. I can't seem to find the follow up to this article where I understand you (Ray) might have finished where you were going with this. Can you help? This would be a great help.

    Thanks in advance.
  • Commented on 08-02-2010 at 8:06 AM
    Sorry - I never got around to actually doing that.
  • Derek #
    Commented on 08-02-2010 at 9:03 AM
    No worries - just making sure I wasn't missing something as I poked around the May and June posts.

    This tasks is a side note to something I'm working on so I'm not sure I'll get through it, but if I do will post something here for what its worth.

    Thanks again for sharing all your work/thoughts/experience etc.
  • Commented on 08-02-2010 at 9:07 AM
    I've made use of the API at ispeech.org. It would be a Flash solution, but would work well. (Not free though.)
  • Derek #
    Commented on 08-02-2010 at 9:22 AM
    Ouch, where's the volume control - lol. I'll take a look at that if my efforts don't pan out quickly enough AND this escalates to a requirement for this application.
    :-)
    Thanks again.
  • Derek #
    Commented on 08-02-2010 at 9:41 AM
    Where does the code above put the WAV file? I've gotten the page to produce the "done" statement so I'm assuming it created a WAV file, correct?
  • Commented on 08-02-2010 at 9:43 AM
    If I remember right, the init() statement was passed the path.
  • Derek #
    Commented on 08-02-2010 at 10:45 AM
    Well, I'm a little bit at a loss on this one, I still can't find where your original code is writing the WAV file. If I understand, correctly, where you left off with this, is that you stopped with the ability to pull from the TTS library the a set of characters that you (the the time) hard coded - which would subsequently be replaced with the CAPTCHA.

    But if I see the "done" text, should I also hear the text or is there the WAV file somewhere on the server?

    Thanks for your patience and your time.
  • Commented on 08-02-2010 at 10:48 AM
    Did you modify line 12 to specify a path on your system? Do you see a wave file there?
  • Derek #
    Commented on 08-02-2010 at 10:54 AM
    Yes and No.
    :-(
    Changed:
    .init("/Library/WebServer/Documents/tts/test",audioFileFormatType)

    to:
    .init("/aud",audioFileFormatType)

    but do not see a file created. Do not see anything recorded in the CF Administrator either.
  • Commented on 08-02-2010 at 10:57 AM
    You got me then - sorry. Is /aud a real path on your system? I'd maybe do something like /Users/foo/Desktop/test - I think the idea is you include a real path and the BEGINNING of a file (test), and the system then uses it to make test.wav.
  • Derek #
    Commented on 08-02-2010 at 11:05 AM
    Yeah, I saw that too, when I examined the code. It looked like test would be the file name since Line 11 defined the extension. I played with that and a couple other things, including adding a "." in to connect the extension (Line 11) to the file (Line 12). Even switching all the forward slashes to backward slashes didn't do anything.

    I'll keep hacking away. Thanks again for your time - I'll post something should I get lucky!
    :-)
  • Leigh #
    Commented on 08-02-2010 at 11:11 AM
    @Derek,

    It may be sending the file somewhere you are not expecting. For example, I used this on windows

    .init("/dev",audioFileFormatType)>

    And the file ended up as c:\dev.wav

    -Leigh
  • Derek #
    Commented on 08-02-2010 at 11:33 AM
    @Leigh, Thanks for chiming in and WOW - there it is!!!! Very interesting - and its content is my last page load CAPTCHA. This is a step in the right direction, just a few other logistical things to maneuver but it looks promising.

    That turns out to be a relative path (to the OS not the web site or CF). So to get the file where I wanted it, I was able to change "\aud" to "D:\website\aud\captcha" with success.

    To round out my work, I'll substitute my application and session scope variables accordingly. The filename "captcha" will be replaced with the user's jsession ID and deleted when the form is posted - that way I do not have any collisions.

    Thank for the help guys, this is looking good.
    :-)
  • Leigh #
    Commented on 08-02-2010 at 12:17 PM
    @Derek,

    Yes, I think that is what Raymond was suggesting before ie substitute an absolute path (for whatever o/s you are using). But I am glad you got things sorted out.

    -Leigh

Post Reply

Please refrain from posting large blocks of code as a comment. Use Pastebin or Gists instead. Text wrapped in asterisks (*) will be bold and text wrapped in underscores (_) will be italicized.

Leave this field empty