Musical Instrument Digital Interface

MIDI (Musical Instrument Digital Interface) is a specification for interconnection of synthesis equipment. The original specification defined a physical connection, transmission protocol, and message format. Because MIDI was developed primarily for the purpose of linking keyboard synthesizers, the message format mimics the actions of a performer at a keyboard. Each synthesizer action results in a MIDI message. Messages are generated whenever a key is pressed or released, when the volume or pitch control is changed, the patch is changed, etc. MIDI is designed to allow use over a relatively slow data link (31.25k bps) and message data is compressed to a minimal size. MIDI defines sixteen different "channels" of data with each channel roughly equivalent to one synthesis instrument. Multiple notes may sound on a channel so long as all of the notes on the channel have the same patch. Furthermore, controller changes (e.g. volume pedal) apply to all notes on the channel and only one message needs to be sent to change all notes on the instrument.

MIDI Messages

MIDI messages can be broken down into two basic categories: channel messages (note on/off, control change, etc.), and system common messages (global parameter changes). MIDI messages begin with a byte where the high order bit is set followed by up to two data bytes with the number and meaning of data bytes defined by the message. The two exceptions to this rule are the SYSEX message and META event, both of which can have a variable number of data bytes following. The real-time SYSEX message data is terminated by a byte with the value 0xF7, but when stored in SMF an explicit byte count is included. META events have an explicit byte count following the message value.

The message byte is divided into two four-bit fields. For channel messages, the high nibble defines the message and the low nibble is the channel number. Since the high bit is always set, there are three bits left to define the actual message and thus only 8 possible messages in MIDI when the channel is specified. However, when the high nibble has all bits set, the message is a system common message and the low nibble defines the message rather than the channel. Thus, we have 7 channel voice messages and 16 possible system common messages for a total of 23 different real-time event messages. In a SMF, the reset message (0xFF) is redefined to indicate a META event. The META event is followed by a byte that defines the actual message, followed by a variable number of data values. This expands the available range to include 256 additional messages for use by a sequencer. Most of the META events are optional and informational (such as song title) and don't affect sound generation.

Whereas a MIDI message is defined by a byte with the high bit set, data values must be less than 0x80 (0-127). For most messages, this range is sufficient. For example, we can represent pitch over a ten-octave range with values 0-120. Likewise, we can represent volume level in dB with values in the range 0-120. In order to produce a data value greater than 127, the value must be split into multiple bytes with each byte containing 7 bits of information. The byte with the high order bits is shifted left by 7 and added to the byte containing the low order bits. This is one of the oddities of the MIDI protocol.

Channel voice messages and the number of data bytes for each are shown in the following table. Remember that the message value is contained in the high nibble with the high bit set and the channel number in the low nibble. For example, a note-on message for channel 1 would be a byte with a value of 0x91.

Message

Data

Description

0

key

velocity

Note Off. This is rarely used.

1

key

velocity

Note On. If velocity is set to zero, this is treated as a note-off message.

2

key

value

Polyphonic Key Pressure (after-touch)..

3

controller value

Control change. MIDI supports 128 different controllers (0-127).

4

program

Program (patch) change.

5

value

Channel pressure (after-touch).

6

lsb

msb

Pitch wheel change. The value is a 14-bit value in two bytes with the low order 7 bits contained in the first byte and the high order 7 bits in the second byte. A value of 0x2000 is considered a center value indicating no pitch variation.

One of the features of a MIDI data stream that we must also consider is called running status. When the message is a channel message, it only has to be sent when the message is different from the last channel message. For example, once a note-on message is sent, only the key and velocity values need to be sent until some other message (such as control change) needs to be sent. This is part of the data compression scheme for MIDI transmission.

The Note Off message is rarely used since a Note On message with a velocity of zero indicates a note should stop playing. Using a running status allows a synthesizer (or sequencer) to send a single Note On message followed by key and velocity information only. When the velocity it non-zero, a note is initiated and when the velocity is zero, the note is stopped.

For Note On and Note Off messages, the key value is typically used to indicate pitch and the velocity value indicates volume. However, a synthesizer may interpret the values otherwise. For example, pitch for a drum kit instrument can represent which drum sound is to be played. Likewise, velocity could be applied to envelope rate in addition to volume level.

Control change messages are used to dynamically alter the sound of the instrument. The most common control messages are for modulation (1) overall volume (7) and panning (10). Modulation is commonly used to indicate vibrato depth. Of the remaining 128 different controllers, many are not used or only vaguely defined. For example, controllers 16-20 are designated "General Purpose" controllers without any specifics of how they should be used. Others such as 72 (release time) and 73 (attack time) do not indicate the range of operation, only the function that is to be modified. A large number of controller numbers are used to specify a "LSB" value. Since only one byte is provided for the controller value, a second message must be used to produce values greater than 127. For fine grained control, the synthesizer can send a second byte using the associated LSB control number. Together the messages allow for 14-bit resolution. If an instrument intends to use the LSB, it must be capable of splicing the two message values together when they occur and also be able to function properly if the LSB value is missing.

The aftertouch messages are used to indicate variations in key pressure while a key is held down. The aftertouch messages are often ignored by event recorders since they can result in a flood of messages that quickly fill up the sequencer memory

Program change allows indication of up to 128 different instruments. However, control #0 can be used to select up to 128 different patch banks, or up to 16,384 if the LSB for control 0 (#32) is used as well. Both the program number and bank numbers are specific to the synthesizer. As a result, a sequence from one synthesizer might sound completely different on another synthesizer. For this reason, the MIDI specification was expanded to define the General MIDI (GM) patch numbers. GM defines a set of common instrument assignments on bank 0 so that a sequence recorded on one synthesizer will sound similar when played on another synthesizer. When implementing a software synthesizer we should attempt to match bank 0 program changes to sound similar to those defined for GM and use other banks for non-standard instruments.

Looking at the events listed above, we can see that some of the events apply to individual notes while others apply to all notes on a channel. The note on/off and polyphonic after-touch apply to individual notes. The control change, program change, channel after-touch and pitch wheel change apply to all notes on a channel. Events that apply to an individual note need to be sent to the instrument that is generating sound. Events that apply to all sounding notes must be handled by a channel object that has knowledge of all active notes. In addition, when a new note is started, all current channel values must be applied to the new note.