Skip to content

Conversation

@BrunoBerisso
Copy link
Contributor

This branch has some general code improvements (fix access rights, prefer guard over if, etc) and three important changes:

  1. Add a new API to add words to the recognition dictionary on runtime. Be aware that new words can't be added while a recognition is in progress. You should add new words before start a recognition process.
    The API expect an array of tuples of String with the form: (word: "HELLO", phones: "HH EH L OW"). The first component is the word in plain English. The second is the pronunciation phones as appear in the cmudict (more here: http://www.speech.cs.cmu.edu/tools/lextool.html) In the future the second component should be calculated

  2. The decode functions now throw exceptions when apply.

  3. There is a new approach to the live decode logic with AVAudioConverter. The idea is read the data in a more appealing format for iOS (float 32, 16000 Hz) and convert it to the Sphinx format (int 16, 16000Hz). AVAudioConverter is only available from iOS 9.0 so the deployment target needs to change. This should address Device does not support required sample rate recording #24 and ps_add_word #33

Please let everybody know your thoughts about this changes.
Thanks!

Bruno Berisso added 4 commits January 24, 2017 12:13
…nstead of open. The same goes for the functions
- Chenge some 'if' statements for 'guards', mostely in the tests
- Use STrue | SFalse instead of 1 | 0 to denote true | false when applicable
…gin in live decoding. The idea is read the data in a more appealing format for iOS (float 32, 16000 Hz) and convert it (with AVAudioConverter) to the Sphinx format (int 16, 16000Hz). AVAudioConverter is only available from iOS 9.0 so the deployment traget needs to change.
…Be aware that new words can't be added while a recognition is in progress. You should add new words before start a recognition process.

The API expect an array of tuples of String with the form: (word: 'HELLO', phones: 'HH EH L OW'). The first component is the word in plain English. The second is the pronunciation phones as appear in the cmudict (more here: http://www.speech.cs.cmu.edu/tools/lextool.html) In the future the second component should be calculated
@BrunoBerisso BrunoBerisso merged commit c05fdef into development May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant