The current state and a little history of the project are as follows. The descriptions also provides a reference for the original system components that are available in the NeXT source archive in the project SVN Repository, and to understand the scope of gnuspeech.
No database creation and manipulation components or interactive interfaces are provided for the TextToSpeech Server itself. Those are only appropriate for Monet and other applications that use it. However, provision is made to set the parameters for controlling static aspects of the synthesis (tube length, mean pitch, and so on—the so-called “utterance-rate parameters”). These static parameters are normally held in a system library as a “defaults database”. This refinement is not yet included in the ports but is a function of ServerTest (see below). The Text-to-Speech Server computes the event framework from the input text via the intermediate input syntax produced by the Parser. This pre-processing includes dictionary look-up to get the correct pronunciation. There is no significant parsing in terms of normal English grammar, and no attempt is made to determine meaning (which would allow different pronunciations of words with the same spelling to be disambiguated, and would to allow slightly more accurate rhythm and intonation to be generated). Such abilities should eventually be added. The word stress information from the dictionary is used to help determine the rhythmic framework according to the Jones/Abercrombie/Halliday (British) “tendency-towards-isochrony” theory of British English speech by placing “foot” boundaries before the word stress in words having word-stressed syllables. “
There's a diagram of the relationships between the various TTS components of the complete system on the project Home Page.
In summary, much of the core software has been, and some is being ported to the Mac under OS/X, and GNU/Linux under GNUStep. All sources and builds for the current work are currently in the Git repository, with older material in the SVN repository under three branches (for the Next, Mac OS X, and GNU/Linux under GNUStep versions—see below). Speech may be produced from input text. The development facilities for managing and creating new language databases, or modifying the existing English database for text-to-speech lack mainly the file writing components. The gnuspeech facilities also provide the tools needed for psychophysical and linguistic experiments. TRAcT, which gives direct access to the tube model, functional—a few of the logarithmic data displays remain to be finished, and clean-up is needed. Some accessory tools are available. As well as the acknowledgements above, Greg Casamento, Adam Fedor and the Savannah Hackers provided valuable support getting the gnuspeech project established, as well as initial work that facilitated the port, including making ubiquitous and tedious changes to the entire NeXT source code to bring it up to OpenStep standards. This work and support is gratefully acknowledged. It involves a lot of effort but is largely invisible to all but the developers involved, and made the actual port to OS X and GNUStep much less painful.