Custom music

The API generates custom audio output (also called an audio render) from an audio timeline, which specifies the length and style of the audio and when different instruments start and stop playing.

At minimum, a timeline must have these elements:

  • The timeline must have at least two spans: one metered span to represent the beginning of the music and one unmetered span to represent the end of the music.
  • The metered span must have at least one region. Regions must have a unique ID, a start beat relative to the span, a descriptor, and a region type, usually music.

Aside from those basic requirements, timelines can be as simple or as complex as you want to make them. For example, you can use different regions to change the descriptor of the audio at different points in time. Also, instead of letting the API choose instruments, you can select instruments from the current descriptor and use instrument groups to control which instruments are in the audio and when they play.

Applications without an API subscription can generate audio, but with these limitations:

  • They can use only the documentary_underscore_heartfelt and cinematic_minimal_tense descriptors.
  • They are restricted to 10 requests per minute to custom audio endpoints.
  • They can create renders no longer than 30 seconds.

If you are interested in a custom audio subscription, contact us.

Timelines

The timeline is the data from which the API generates rendered audio. A timeline consists of a series of sequential, non-overlapping spans. Each span contains one or more regions that determine the style of the audio and one or more instrument groups that specify the instruments and when they play during that span.

This diagram shows the elements of a timeline. The timeline contains spans, the spans contain regions and instrument groups, and the instrument groups contain statuses.

Here's an example of a simple timeline with one metered span. one unmetered span, and one region:

{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "my_custom_audio",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 76,
            "regions": [
              {
                "id": 222,
                "region": "music",
                "descriptor": "cinematic_minimal_tense",
                "beat": 0
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 30.0
          }
        ]
      }
    }
  ]
}

This example is a more complicated timeline with two instrument groups:

  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "my_custom_audio",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 76,
            "regions": [
              {
                "id": 222,
                "region": "music",
                "descriptor": "cinematic_minimal_tense",
                "beat": 0
              }
            ],
            "instrument_groups": [
              {
                "instrument_group": "pop_fuel_yamaha_upright_acoustic_piano",
                "statuses": [
                  {
                    "beat": 0,
                    "status": "active"
                  }
                ]
              },
              {
                "instrument_group": "direct_destiny_mid_pad",
                "statuses": [
                  {
                    "beat": 0,
                    "status": "active"
                  }
                ]
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 30.0
          }
        ]
      }
    }
  ]
}

Spans

Spans indicate the beginning of a period of music or silence, but you can think of them as a period of time in the music.

There are two types of spans: metered and unmetered. Metered spans contain music. Unmetered spans do not contain music and are used to denote the end of the preceding metered span.

A span has a start time in seconds and a tempo in beats per minute. The events within the span are measured in beats, based on that number of beats per minute.

Spans never overlap each other. For this reason, if an element within a span have a start beat that is after the end of the span, that element is ignored.

Regions

Regions indicate the descriptor, or style, for a period of music within a span and the beat that it starts on, relative to the start of the span. The descriptor controls the instruments that are available and the genre or flavor for the music. A span can have multiple regions or use the same region for the entire span, but regions cannot overlap.

Descriptors

A descriptor is a style or genre for the music in a region. The descriptor that you choose has a huge impact on the sound of the music.

Descriptors limit the bands and instruments that are available. During a region of audio, you can use only instruments that are available in that descriptor. The API ignores instruments that are not in the active descriptor.

Instrument groups

Instrument groups include one instrument and a series of status objects that specify when that instrument plays and stops. Each status starts at a beat, relative to the start of the span.

Creating renders

To create a render, pass the timeline object and file format preset (MASTER_MP3 for an MP3 file, MASTER_WAV for a WAV file, or STEMS_WAV for individual WAV tracks for each instrument) to the POST /v2/ai/audio/renders endpoint. The response body includes the ID of the render, which you can use to download the output file.

DATA='{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "my_custom_audio",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 76,
            "regions": [
              {
                "id": 222,
                "region": "music",
                "descriptor": "cinematic_minimal_tense",
                "beat": 0
              }
            ],
            "instrument_groups": [
              {
                "instrument_group": "pop_fuel_yamaha_upright_acoustic_piano",
                "statuses": [
                  {
                    "beat": 0,
                    "status": "active"
                  }
                ]
              },
              {
                "instrument_group": "direct_destiny_mid_pad",
                "statuses": [
                  {
                    "beat": 0,
                    "status": "active"
                  }
                ]
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 30.0
          }
        ]
      }
    }
  ]
}'

curl -X POST "https://api.shutterstock.com/v2/ai/audio/renders" \
-H "Authorization: Bearer $SHUTTERSTOCK_API_TOKEN" \
-H 'Content-Type: application/json' \
-d "$DATA"

The response looks like this example:

{
  "audio_renders": [
    {
      "id": "njlpYoWWmb1AYs2nZyw7EcNWbAkZ",
      "timeline": {},
      "status": "WAITING_COMPOSE",
      "preset": "MASTER_MP3",
      "progress_percent": 0,
      "files": [],
      "created_date": "2021-01-26T12:10:22-05:00",
      "updated_date": "2021-01-26T13:10:22-05:00"
    }
  ]
}

Downloading renders

The time that the API takes to render the output depends on the length of the audio and the descriptors in the timeline. You can use the GET /v2/ai/audio/renders endpoint to track the status of the rendering process and download the audio when it is ready. You must call this endpoint once to trigger the rendering process and then call it again to see if the output file is ready. To avoid exceeding your application's rate limit, don't call this endpoint more than once in 10 seconds.

Renders that are not ready have a status such as WAITING_COMPOSE. Renders that are ready have a status of CREATED and include a link to the output file.

curl -X GET https://api.shutterstock.com/v2/ai/audio/renders \
-H "Accept: application/json" \
-H "Authorization: Bearer $SHUTTERSTOCK_API_TOKEN" \
-G \
--data-urlencode "id=njlpYoWWmb1AYs2nZyw7EcNWbAkZ"

Here's an example of a render in progress:

{
  "audio_renders": [
    {
      "id": "njlpYoWWmb1AYs2nZyw7EcNWbAkZ",
      "timeline": {},
      "status": "WAITING_COMPOSE",
      "preset": "MASTER_MP3",
      "progress_percent": 0,
      "files": [],
      "created_date": "2021-01-26T12:10:22-05:00",
      "updated_date": "2021-01-26T13:10:22-05:00"
    }
  ]
}

Here's an example of a completed render:

{
  "audio_renders": [
    {
      "id": "azQhRPBD9nh6vL8TM767yfGrygv5",
      "status": "CREATED",
      "preset": "MASTER_MP3",
      "progress_percent": 100,
      "files": [
        {
          "bits_sample": 16,
          "content_type": "audio/mp3",
          "download_url": "https://s3-us-west-2.amazonaws.com/amper-ephemeral/renders/2021/02/10/amper-api-azQhRPBD9nh6vL8TM767yfGrygv5/0.mp3",
          "frequency_hz": 44100,
          "kbits_second": 192,
          "size_bytes": 3601830,
          "tracks": [
            "master"
          ],
          "filename": "my_custom_audio"
        }
      ]
    }
  ]
}

For more information the custom music endpoints, see Generating custom music in the API reference.

Advanced music options

You can create simple renders by specifying a descriptor and letting the API generate a piece of music by itself, or you can take greater control over the timeline by specifying advanced options such as instruments, transitions, endings, and tempo changes.

Instruments

By default, the API automatically selects instruments that are available in the descriptor. You can also specify instruments and set when they turn on or off by adding instrument groups to the timeline. You can add any number of instrument groups to the timeline, but the instruments must be available in the descriptor.

Each instrument group has the ID of one instrument and an array of status objects that specify when that instrument plays, relative to the start of the span in seconds. Each instrument is inactive by default. When you set the instrument to active or inactive, it stays that way until another status object changes it or the audio ends.

For example, this timeline turns a drum on and off:

{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "My_audio_ai.mp3",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 120,
            "regions": [
              {
                "id": 222,
                "descriptor": "cinematic_percussion_primal_brewing",
                "beat": 0,
                "region": "music"
              }
            ],
            "instrument_groups": [
              {
                "instrument_group": "classic_taiko",
                "statuses": [
                  {
                    "beat": 5,
                    "status": "active"
                  },
                  {
                    "beat": 10,
                    "status": "inactive"
                  },
                  {
                    "beat": 15,
                    "status": "active"
                  },
                  {
                    "beat": 20,
                    "status": "inactive"
                  },
                  {
                    "beat": 25,
                    "status": "inactive"
                  }
                ]

              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 20
          }
        ]
      }
    }
  ]
}

Tempo changes

By default, music plays at the tempo that you specify in the span in beats per minute (BPM). You can override the tempo and create a tempo curve by adding two or more tempo changes. The API creates the overall tempo of the span by using a linear interpolation of the time between each tempo change. In this case, the API ignores the tempo property of the span.

All tempos must be within the tempo range of the descriptor.

For example, this timeline starts at 160 BPM, slows to 120, and accelerates back to 160:

{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "My_audio_ai.mp3",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 160,
            "regions": [
              {
                "id": 222,
                "descriptor": "cinematic_percussion_primal_tense",
                "beat": 0,
                "region": "music"
              }
            ],
            "tempo_changes": [
              {
                "time": 0,
                "tempo": 160
              },
              {
                "time": 7,
                "tempo": 120
              },
              {
                "time": 14,
                "tempo": 160
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 20
          }
        ]
      }
    }
  ]
}

To make abrupt tempo changes, put the tempo changes close together. This example changes the tempo quickly from 60 BPM to 120 BPM:

[
  {
      "time": 14.999,
      "tempo": 60
  },
  {
      "time": 15,
      "tempo": 120
  }
]

Endings and transitions

By default, the API adds natural endings to audio and transitions smoothly between changes in the timeline. You can change how the API handles endings and transitions by adding end_type objects to regions.

The API supports two types of transitions: a ringout ending and a cut transition.

Endings

To add a ringout ending at a specific point in a span, which allows the last note to fade out, add an event_type object with the event field set to "ending" and the type field set to "ringout," as in this example:

{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "My_audio_ai.mp3",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 110,
            "regions": [
              {
                "id": 222,
                "descriptor": "documentary_idiophonic_happy",
                "beat": 0,
                "region": "music",
                "end_type": {
                  "beat": 21,
                  "event": "ending",
                  "type": "ringout"
                }
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 20
          }
        ]
      }
    }
  ]
}

Specifying an ending like this causes all instruments to stop playing, but the music continues to the unmetered span as usual. Because the ending does not change the length of the audio, there may be a period of silence after the instruments stop playing and the last note fades out.

Transitions

When two regions border each other, by default, the AI applies an ending to the first region before starting the second region. You can make the transition more abrupt by adding an event_type object with the event field set to "transition" and the type field set to "cut." This example ends one region of music suddenly and cuts immediately to a new region:

{
  "audio_renders": [
    {
      "preset": "MASTER_MP3",
      "filename": "My_audio_ai.mp3",
      "timeline": {
        "spans": [
          {
            "id": 111,
            "span_type": "metered",
            "time": 0,
            "tempo": 120,
            "regions": [
              {
                "id": 222,
                "descriptor": "documentary_idiophonic_happy",
                "beat": 0,
                "region": "music",
                "end_type": {
                  "beat": 20,
                  "event": "transition",
                  "type": "cut"
                }
              },
              {
                "id": 333,
                "beat": 20,
                "descriptor": "cinematic_percussion_epic_tense",
                "region": "music"
              }
            ]
          },
          {
            "span_type": "unmetered",
            "time": 20
          }
        ]
      }
    }
  ]
}

Keys

By default, the API selects a key for the music automatically. You can specify information about the key by adding a key object to the region. The key object has these fields:

  • The tonic_note field (required) is the letter value of the key, including "c," "d," "e," "f," "g," "a," and "b."
  • The tonic_accidental field (required) is the accidental value of the key, including "double flat," "flat," "natural," "sharp," and "double sharp."
  • The tonic_quality field is the scale quality or mode, including "major", "natural_minor", "harmonic_minor", "melodic_minor", "ionian", "dorian", "phrygian", "lydian", "mixolydian", "aeolian", and "locrian."

For example, this region is set to the key of D flat:

"regions": [
  {
    "id": 222,
    "descriptor": "documentary_idiophonic_happy",
    "beat": 0,
    "region": "music",
    "key": {
      "tonic_note": "d",
      "tonic_accidental": "flat"
    }
  }
]