VOX Project

A neural network-based system for speech recognition

VOX is a speech recognition school project I worked in back in 1998. Here’s a brief description of the project, and most of the artifacts we produced back then.

About VOX

In 1998, I have had the pleasure to work in the development of the VOX project, with my colleagues Francisco Socal and Rafael de Figueiredo, and under supervision of Prof. Daniel Hart. It was developed as the final project for our secondary-level technical course in electronics, at Fundação Liberato.

As the project title suggests, our goal was to develop a method based on neural networks capable to recognize words spoken by the user. Using a vocabulary limited to four words, we achieved recognition rates of about 80%, with the system calibrated for a specific user. I consider this to be an excellent result, specially taking into account that we were high school students getting education that was not focused on computer programming, and, furthermore, working in times when the Internet was just starting be become available for the rest of us.

The greater part of what we created for this project is available below (but virtually all of it is in Portuguese). But before downloading anything, here is a warning: the source code of VOX is packed with bad programming practices and memory leaks. Don’t try to use it to learn how to program. Really. Twelve years of C++ programming later, I know what I am talking about!

Files

  • Report: The document (in Portuguese) that describes our system and the results we obtained. [PDF (1.5MB)].
  • Speech-controlled presentation: A project presentation that also serves as demonstration, since it is controlled by speech. In Portuguese. [ZIP (5.8MB), containing the sources (C++ Builder), executable binary (Win32) and the needed data files]
  • VOX: Generic program: Allows to record training samples, train the neural net and test the recognition. It is in Portuguese, but I believe it should be usable for resolute non-speakers of Portuguese. [ZIP (171kB), with the sources (C++ Builder) e executable (Win32)].
  • Flag recognizer: A little program that has nothing to do with speech recognition, but was created in project VOX as an experiment with neural nets. It lets the user draw a flag, and identifies to which country (Brazil, Argentina or Japan) it belongs to. [ZIP (153kB) with the sources (C++ Builder), the executable (Win32) and the data file with the neural net weights].
  • Main sources: Package with all classes and routines related with sound capture and processing and neural networks. They are required to compile all the other programs. The code is C++ for Win32, but the classes for neural nets use the TFileStream from Borland C++ Builder’s library. It would be fairly easy to convert them to use the std::fstream class from C++’s standard library, but I don’t intend to do this. [ZIP (11kB)].

Unnecessary note about the date

The date on this post, October, 1998, is the approximate date we completed the project. I don’t remember when I first posted this content to the Internet.