Skip to content

whisper

Model ID: @cf/openai/whisper

Automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data

Properties

Task Type: Automatic Speech Recognition

Code Examples

Workers - Typescript

export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const res = await fetch(
"https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav"
);
const blob = await res.arrayBuffer();
const input = {
audio: [...new Uint8Array(blob)],
};
const response = await env.AI.run(
"@cf/openai/whisper",
input
);
return Response.json({ input: { audio: [] }, response });
},
} satisfies ExportedHandler<Env>;

curl

Terminal window
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/openai/whisper \
-X POST \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--data-binary "@talking-llama.mp3"

Response

Automatic speech recognition responses return both a single string text property with the audio transcription and an optional array of words with start and end timestamps if the model supports that.

{
"text": "It is a good day",
"word_count": 5,
"words": [
{
"word": "It",
"start": 0.5600000023841858,
"end": 1
},
{
"word": "is",
"start": 1,
"end": 1.100000023841858
},
{
"word": "a",
"start": 1.100000023841858,
"end": 1.2200000286102295
},
{
"word": "good",
"start": 1.2200000286102295,
"end": 1.3200000524520874
},
{
"word": "day",
"start": 1.3200000524520874,
"end": 1.4600000381469727
}
]
}

API Schema

The following schema is based on JSON Schema

Input JSON Schema

{
"oneOf": [
{
"type": "string",
"format": "binary"
},
{
"type": "object",
"properties": {
"audio": {
"type": "array",
"items": {
"type": "number"
}
}
},
"required": [
"audio"
]
}
]
}

Output JSON Schema

{
"type": "object",
"contentType": "application/json",
"properties": {
"text": {
"type": "string"
},
"word_count": {
"type": "number"
},
"words": {
"type": "array",
"items": {
"type": "object",
"properties": {
"word": {
"type": "string"
},
"start": {
"type": "number"
},
"end": {
"type": "number"
}
}
}
},
"vtt": {
"type": "string"
}
},
"required": [
"text"
]
}