Skip to content
Todd edited this page Dec 7, 2016 · 34 revisions

Status: open for discussion

This page was created because the discussion in issue #19 got too long.

Related issues: #19, #54

PySoundCard is a wrapper for PortAudio and its API is more or less closely modeled after the PortAudio API. This is really useful for implementing applications which need some kind of audio input/output.

It is, however, not quite practical if audio I/O is needed in an interactive Python session (e.g. using IPython). In such a context it would be helpful to have a few simplified high-level commands which may cover only a subset of PortAudio's functionality, but which are much more concise and therefore easier to type.

Possible Functionality

  • play the contents of a given NumPy array
  • record a given length of audio input into a NumPy array
  • play and record at the same time (sample-accurately)
  • query the currently available devices
  • query current settings
  • change settings for future calls
  • en-queue an array for future playback
  • play a sound while another is already playing
  • loop whole / part of a sound
  • play at controllable speed (for scrubbing)
  • envelopes, other effects
  • recording with user-specified real-time analysis of the input signal (e.g. silence detection, segmentation, loudness estimation, FFT, ...)
  • use an array to fill the available space in a buffer and return a view of the remainder of the array

General API Properties

All calls should be non-blocking by default, i.e. playback/recording should happen in the background. If desired, calls can be made blocking with an argument, e.g. blocking=True.

We need separate settings for input and output, but perhaps we can also provide combined attributes.

Devices can be selected with their device ID, but also by a (case-insensitive) device substring.

Channel masks should be supported to select a subset of hardware channels for playback/recording

Possible APIs

There are of course many ways to do this, we should try to find a good trade-off between simplicity (easy to understand, little amount of typing) and flexibility.

This is assumed in the following examples:

import pysoundcard as pa

A Single Stream, Global State

This is probably the easiest to use, but also has the most limitations.

pa.play(myarray)
myrec = pa.rec(4.5)  # record 4.5 seconds of audio
myrec2 = pa.playrec(myarray)

pa.play(myarray, blocking=True)

As there is only one stream instance, it's easy to stop playback:

pa.stop()

It also means that if we start a new playback, the old one is automatically stopped:

pa.play(my_long_array)
# wait some time
pa.play(another_array)  # playback of my_long_array is stopped here
# wait some time
pa.stop()

Playback can also be started in a non-blocking manner and after some time it can be made blocking:

pa.play(my_quite_long_array)
# here we can do something else
# once we're done, we can wait for playback to finish:
pa.wait()  # this blocks until playback is finished

There is a single state object which handles all settings. It can be used to query the current settings and to change settings. In the beginning it holds the PortAudio defaults. Alternatively all values could be None in the beginning.

Possible names: settings, config, options, preferences, init, ...?

pa.config.input_device
# returns informations about input device, or device ID
pa.config.input_device = 5
# a device substring (case in-sensitive) can be used
pa.config.input_device = "microphone"
pa.config.input_latency = 0.007

Combined settings:

pa.config.device = 6  # set both input_device and output_device to 6

Probably there should be a way to reset to the initial values:

pa.config.reset()

A problem with settings:

What happens to the rest of the settings if the device is changed? Do they stay the same (possibly not being allowed values for the new device)? Are they reset to the default values of the device?

Probably it's best to have None as default value for all settings and all settings can be assigned independently (in any order). When calling a function which uses the settings (e.g. pa.play()) and the values are not meaningful, an exception should be raised.

Channel masks (available for input and output!):

pa.play(myarray, channelmask=[5, 27, 56])

A separate argument is probably not necessary, channels can either be a number (the number of channels) or an iterable (a channel mask):

pa.play(myarray, channels=[5, 27, 56])

A Class that Does Everything

This API allows multiple streams, but it is also a little more complicated.

Possible names for the class: SoundCard, AudioIO, SoundIO, IO, ...

sc = pa.SoundCard()
sc.play(myarray)
sc2 = pa.SoundCard(input_device=..., samplerate=...)
sc2.play(another_array)
sc.stop()
sc2.wait()

Settings can be specified in the constructor or as object properties:

sc = pa.SoundCard(output_device=...)
sc.input_latency = 0.007

Free Functions and a Settings Class

Probably not a good idea ...

conf = pa.Settings()
conf.device = "hdmi"

pa.play(myarray, conf)

A Submodule or a Module-level Object

From a user's perspective it looks more or less the same if it's a submodule or an object.

Only one stream.

Possible names: player, recorder, soundcard, ...

This as similar behavior to "A Single Stream, Global State", but it is more complicated to type.

pa.soundcard.play(myarray)
pa.soundcard.config.output_latency = 0.007

A virtual sound card object

This would be a different abstraction than portaudio is providing. We would essentially create one pre-defined object per portaudio device, with appropriate play/record functions. This would require some string magic to to convert device names to valid variable names.

dir(pa.soundcard)
=> ['Internal_Microphone',
    'Internal_Loudspeaker',
    'UA25_Ex']
recording = pa.soundcard.Internal_Microphone.record(4.5)
pa.soundcard.Internal_Loudspeaker.play(recording)
recording = pa.soundcard.UA_25EX.playrec(cool_song)

Alternatively, this could all be saved in dictionaries:

pa.soundcards.keys()
=> dict_keys(['Internal Microphone', 'Internal Loudspeaker', 'UA-25 Ex'])
recording = pa.soundcards['Internal Microphone'].record(4.5)
pa.soundcards['Internal Loudspeaker'].play(recording)
recording = pa.soundcards['UA-25 EX'].playrec(cool_song)

There are probably some more complications once you take APIs into account. It might be easier to run with the default API instead of providing separate soundcard instances per API.

This does not allow playreccing across devices. However, cross-device Streams are known to have timing issues anyway, so this might be a fitting abstraction (but PortAudio sometimes provides separate "devices" for input and output of one physical sound card). Also, it could still allow something like:

output = pa.soundcard.InternalLoudspeaker
input = pa.soundcard.InternalMicrophone
output.play_blocks(input.record_blocks(4.5, 1024))

However, it is unclear if playback and recording are guaranteed to be operating on the same block. It could happen that playback happens one block later than recording.

Another possibility would be to use a function that takes a soundcard name as a string, as well as other optional arguments such as API and sampling rate, and returns a soundcard object. Multiple calls to the function with the same name (or perhaps same and same API) will return the same object. This would work similar to the Python logging.getLogger function.

>> recorder1 = pa.getRecordingDevice('Internal Microphone', api='ASIO', fs=44100)
>> recorder2 = pa.getRecordingDevice('Internal Microphone', api='ASIO', fs=44000)
>> print(recorder1 is recorder2)
True
>> recording = recorder1.record(4.5)

With this approach it could be possible to provide a list or tuple of device names to allow playreccing across devices. However, having a single class that handles both real and combined devices would probably get too complicated, so the function could return an instance of a different class in this case.

Open questions:

How would the default device be selected?

Continuous playback/recording

All the API propositions have in common that each call opens and closes a Stream. Thus, continuous playback and recording is impossible. However, this could be circumvented in several ways:

  • Provide a callback to playrec:

    def callback(input, output):
        output[:] = input[:]*0.5
    recording = obj.playrec(cool_song, callback=callback)
  • Have block/generator-based play and record functions

    def filter(input):
        return input[:]*0.5
    player.play_blocks(recorder.record_blocks(4.5, effect=filter))
    ## or ##
    player.play_blocks(recorder.record_blocks(4.5), effect=filter)

    Indeed, this syntax might be a bit too complicated

    with input, output in zip(recorder.record_blocks(4.5), player_blocks(4.5)):
        output[:] = input[:]*0.5
    ## or ##
    with data in pa.playrec_blocks(4.5):
        data[:] = data[:]*0.5

    This has the disadvantage of introducing an additional block length of necessary delay though.

Graph Based API (like webaudio)

Developers can connect different building blocks, such as sources, outputs and effects.

This proposal would use one of the other APIs for audio inputs and outputs and is mostly about the graph -

# This is a port of the example at  http://creativejs.com/resources/web-audio-api-getting-started/

import pa
import pa.audionodes as audionodes

# These next two lines would probably be more like one of the other APIs
source=audionodes.source("test.wav")
output=pa.SoundCard()

volumeNode = audionodes.Gain(gain=0.1)

# Create a lowpass filter to quieten sounds over 220hz
filterNode = audionodes.filters.LowPass(frequency=220)
 

# Join everything together
# source->volumeNode->filterNode->output

source.connect(volumeNode)
volumeNode.connect(filterNode)
filterNode.connect(output)

In this system one of the other APIs would be used as a base, for streams, inputs and outputs - where appropriate these could be used as nodes in the graph.

Audio Clips: (playback with looping)

[Note - this section might be part of the Sequencer API suggestion]

Analogous to a Wave as inserted in a channel one of the sequence based audio editors.

# Create audio clip from a wave
clip1=AudioClip('blah.wav')

# Create audio clip from 5 seconds of an input stream
s = Stream()
clip2 = AudioClip(s, length=5., loop=True)

Functionality would include start + end points:

         s     e
    [    |||||||    ]

Only audio between the start and end points would be played (they would default to the start / end of the whole wave).

(This part could be a something like a "Selection") ?

clip1.start = 12.0 # set start to 12 seconds
clip1.end = 14.0   # set end to 14 seconds

Looping:

clip1.loop = True if set playback will loop from the start marker to the end marker.

Envelopes:

Lots of sequencing audio editors allow you to edit the envelope of a Wave, should this be part of an AudioClip, or can you just add any amount of effects to it ?

Envelopes could contain tuples of (time, amplitude), you could also pass in a function that would receive (time) and return a float, in this way you could make simple effects, something like this:


# Create a sawtooth envelope
#     /\
#   /    \
# /        \
sawtooth = Envelope([0., 0.], [.5, 1.], [1., 0.])
clip1.envelope = sawtooth

# Using a function to set the envelope
sinewave = Envelope(math.sin)
clip2.envelope = sinewave

If the graph API is adopted, then Envelope would be another kind of node:

source.connect(Envelope(math.sin))

Sequencer API

A Higher level API to build sequencer type apps, from module players/trackers and 'Lego Brick' style sequencers.

This could include a Piano Roll, containing named channels - which can have Audio Clips (the 'Lego Bricks') placed at particular times.

If this was built to enable module players then the concept of patterns could need to be added (basically a list of Piano Rolls to play.

Integration with Aubio for Audio Analysis

Elsewhere the idea of having FFT available in pyaudio was mentioned, integration with Aubio could allow other things like "on_set" detection and many other things.

It looks like Aubio exposes it's interfaces as numpy arrays so further integration should be possible.

This might enable higher level APIs along the lines of Minim (for Processing) which provides things like Beat Detection.