File:AI-generated audio featuring bossa nova music with electric guitar.ogg

AI-generated_audio_featuring_bossa_nova_music_with_electric_guitar.ogg (Ogg Vorbis sound file, length 15 s, 141 kbps, file size: 262 KB)

This is a file from the Wikimedia Commons. Information from its description page there is shown below.
Commons is a freely licensed media file repository. You can help.

Summary

Description	Demonstration of an algorithmically-generated audio track featuring bossa nova music accompanied by electric guitar, created using Riffusion, an open-source fine-tuned derivative of the Stable Diffusion image-generation diffusion model that has been retrained to generate images of audio spectrograms, which can then be converted into audio files. An audio spectrogram is a visual representation of an audio clip's frequency content, and images of spectrograms can be converted into audio via short-time Fourier transform, using the Griffin-Lim algorithm to approximate phase during audio reconstruction. While the Stable Diffusion AI model is originally intended to generate visual images from a textual prompt, Riffusion has been retrained from Stable Diffusion v1.5 to instead generate spectrogram images from text prompts describing musical motifs, fine-tuned through the use of Nvidia A10G enterprise datacenter GPUs. Procedure/Methodology The spectrograms were generated using the Riffusion Inference Server running the riffusion-model-v1 diffusion model, paired with the Riffusion App UI frontend. The following values were used: Prompt: "bossa nova with electric guitar" Seed Image: OG Beat Denoising: 0.75 This resulted in the output spectrogram image: Spectrogram image Spectrograms were then converted to WAV audio using this python script: Audio converted from spectrogram Riffusion generates 512×512 resolution images which each represent 5 second chunks of looping audio; for the convenience of the reader, the three generated spectrogram images have been merged together in GIMP along the x-axis (which represents time), and the audio files have been merged together in Audacity and then converted to OGG Vorbis.
Date	17 December 2022
Source	Own work
Author	Benlisquare
Permission (Reusing this file)	Output images As the creator of the output images and audio, I release this file under the licence displayed within the template below. Stable Diffusion AI model The Stable Diffusion AI model is released under the CreativeML OpenRAIL-M License, which "does not impose any restrictions on reuse, distribution, commercialization, adaptation" as long as the model is not being intentionally used to cause harm to individuals, for instance, to deliberately mislead or deceive, and the authors of the AI models claim no rights over any image outputs generated, as stipulated by the license. Riffusion v1 model The Riffusion v1 model, created by Seth Forsgren and Hayk Martiros, is released under the CreativeML OpenRAIL-M License and is a derivative model of the Stable Diffusion v1.5 model checkpoint. Riffusion Inference Server The Riffusion Inference Server is released under an MIT License. Riffusion App The Riffusion App is released under an MIT License.

Licensing

I, the copyright holder of this work, hereby publish it under the following licenses:

This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.

You may select the license of your choice.

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	22:22, 17 December 2022		15 s (262 KB)	Benlisquare	{{Information \|Description= Demonstration of an algorithmically-generated audio track featuring bossa nova music accompanied by electric guitar, created using [https://www.riffusion.com/about Riffusion], an open-source fine-tuned derivative of the Stable Diffusion image-generation diffusion model that has been retrained to generate images of audio spectrograms, which can then be converted into audio files. An audio spectrogram i...

File usage

The following 3 pages use this file:

Global file usage

The following other wikis use this file:

Usage on el.wikipedia.org
- Παραγωγική τεχνητή νοημοσύνη
Usage on nl.wikipedia.org
- Generatieve kunstmatige intelligentie
Usage on ro.wikipedia.org
- Inteligența artificială generativă
Usage on uk.wikipedia.org
- Генеративний штучний інтелект
Usage on uz.wikipedia.org
- Generativ sunʼiy intellekt

Metadata

This file contains additional information, probably added from the digital camera or scanner used to create or digitize it.

If the file has been modified from its original state, some details may not fully reflect the modified file.

Author	Benlisquare
Short title	Riffusion, prompt "bossa nova with electric guitar"
Software used	Xiph.Org libVorbis I 20200704 (Reducing Environment)

File:AI-generated audio featuring bossa nova music with electric guitar.ogg

Summary

Licensing

Captions

Items portrayed in this file

depicts

creator

some value

copyright status

copyrighted

copyright license

GNU Free Documentation License, version 1.2 or later

Creative Commons Attribution-ShareAlike 4.0 International

inception

17 December 2022

media type

application/ogg

source of file

original creation by uploader

checksum

a7005e27a53a8c2a562f4eb8da45eed8a2044a68

data size

268,288 byte

duration

15.2686167800454 second

File history

File usage

Global file usage

Metadata