Commit Graph

548 Commits

Author SHA1 Message Date
Georgi Gerganov
fc45bb8625 talk-llama : sync llama.cpp
ggml-ci
2025-08-18 20:30:45 +03:00
Georgi Gerganov
7fd2fbde45 common : handle mxfp4 enum
ggml-ci
2025-08-18 20:30:45 +03:00
Daniel Bevenius
040510a132
node : add win platform check for require path (#3363)
This commit adds a check to the platform in use and adjust the path to
the addon.node shared library.

The motivation for this change is that on windows addon.node library is
built into build\bin\Release and on linux into build/Release.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3360
2025-08-15 14:54:23 +02:00
Georgi Gerganov
b02242d0ad
wasm : change ggml model host to HF (#3369) 2025-08-10 13:00:17 +03:00
Daniel Bevenius
0becabc8d6
stream.wasm : add language selection support (#3354)
* stream.wasm : add language selection support

This commit adds support for selecting the language in the stream.wasm
example. This is includes adding the model `base` which supports
multilingual transcription, and allowing the user to select a language
from a dropdown menu in the HTML interface.

The motivation for this is that it allows users to transcribe audio in
various languages.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3347

* squash! stream.wasm : add language selection support

Remove strdup() for language in stream.wasm and update butten text for
base (should not be "base.en" but just "base").
2025-08-02 07:03:04 +02:00
Georgi Gerganov
d0a9d8c7f8 talk-llama : sync llama.cpp 2025-07-28 13:02:32 +03:00
Daniel Bevenius
7de8dd783f
examples : add note about WHISPER_WASM_SINGLE_FILE [no ci] (#3332)
This commit adds a note to the README files of the WASM examples
about the `WHISPER_WASM_SINGLE_FILE` option.

The motivation for this is that currently this option is not documented
and might be surprising to users who expect a separate .wasm file to be
generated.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3290
2025-07-24 16:06:48 +02:00
Sacha Arbonel
1f5cf0b288
server : hide language probabilities option behind flag (#3328)
* examples/server: hide language probabilities option behind flag

* code review

* fix
2025-07-21 13:03:54 +02:00
Greg Sadetsky
a16da91365
examples : update links in wasm examples (#3318)
* fix 404 link

* update link in whisper.wasm example

* update example in command.wasm

* update link in bench.wasm example

* update link in stream.wasm example
2025-07-12 23:22:35 +02:00
Georgi Gerganov
6ddff4d96a talk-llama : sync llama.cpp
ggml-ci
2025-07-12 19:23:56 +03:00
accessiblepixel
869335f2d5
server : add dtw.params for v3-large-turbo (#3307)
* Add DTW model large-v3-turbo parameters to server.cpp example

DTW support is available in whispercpp and the large-v3-turbo model has already been added to the sources, but the large-v3-turbo model hasn't been added to the server.cpp file to make use of it. This commit hopefully corrects that issue.

* match original linebreak of original server.cpp file after adding large.v3.turbo dtw
2025-07-07 12:51:15 +03:00
Lin Xiaodong
d9999d54c8
feat: support vad for addon.node (#3301)
Co-authored-by: linxiaodong <calm.lin@wukongsch.com>
2025-07-02 13:14:29 +03:00
Georgi Gerganov
1f816de7da talk-llama : sync llama.cpp 2025-07-01 17:54:53 +03:00
Daniel Bevenius
32cf4e2aba
whisper : add version function (#3289)
* whisper : add version function

This commit adds a version function to the whisper API.

The motivation for this is that it might be convenient to have a way to
programmatically check the version.

Example usage:
```c++
printf("Using whisper version: %s\n", whisper_version());
```
Will output:
```console
Using whisper version: 1.7.6
```

* examples : add version to android example CMakeLists.txt
2025-06-26 18:09:42 +02:00
Georgi Gerganov
dc8dda60ee
bench : print system info before ctx check 2025-06-25 16:01:32 +03:00
Daniel Bevenius
1ad258ca31
stream : add nullptr check of whisper_context (#3283)
* stream : add nullptr check of whisper_context

This commit adds a check to ensure that the `whisper_context` is not
null after initialization.

The motivation for this is that currently, if the initialization fails,
the program continues to run leading to a segmentation fault. This sort
of check is performed by others examples like whisper-cli.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3280#issuecomment-3003778035

* examples : add nullptr check for whisper_context
2025-06-25 14:16:31 +02:00
Aaron Ang
4d6ae52ed3
command: output commands to text file (#3273)
This commit implements code for the command line argument `-f --file FNAME` which is currently missing.
2025-06-24 06:41:21 +02:00
Georgi Gerganov
e6c10cf3d5 talk-llama : sync llama.cpp
ggml-ci
2025-06-21 07:34:17 +03:00
Daniel Bevenius
3e65f518dd
android : update CMakeLists.txt to use FetchContent for ggml (#3268)
* android : update CMakeLists.txt to use FetchContent for ggml

This commit updates the CMakeLists.txt file for the Android Whisper
example to use FetchContent for managing the ggml library.

The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.

I've built and run the example locally to verify that it works as
expected.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3265#issuecomment-2986715717

* android.java : update cmake to use FetchContent for ggml

This commit updates the CMake configuration for the Android Java example
to use `FetchContent` for including the `ggml` library. Do be able to
use FetchContent we also update the `compileSdkVersion` and
`targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'.
This also required a an update to the Gradle plugin version to 7.4.0.

The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
2025-06-19 16:06:42 +02:00
Georgi Gerganov
17bece1885
cmake : fix android build (#3265)
* cmake : fix android build

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2025-06-19 08:24:41 +02:00
Daniel Bevenius
ecb8f3c2b4
examples : add stereo to mono conversion in read_audio_data (#3266)
This commit adds a conversion from stereo to mono in the
`read_audio_data` function of `common-whisper.cpp`.

The motivation for this change is prior to Commit
7d3da68f79 ("examples : use miniaudio for
direct decoding flac, mp3, ogg and wav (#2759)", there was a step that
read stereo int16 data -> pcm16 (448512 samples), and then converted to
mono (224256 samples), and then also convert to stereo in `pcmf32s.

The middle step here seems to have been missed when rewriting the code to
use Miniaudio and caused issues then transcribing stereo audio files.

For example, currently using the audio sample in the linked issue the
output is:
```console
[00:00:00.000 --> 00:00:03.000]  (speaker 1) Sous-titres réalisés para la communauté d'Amara.org
```

And with the change in this commit the output is:
```
[00:00:00.000 --> 00:00:01.500]  (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000]  (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500]  (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500]  (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500]  (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500]  (speaker 1) de préparer un courrier
```

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3092
2025-06-18 17:41:43 +02:00
Georgi Gerganov
2f60ebc3c2 talk-llama : sync llama.cpp
ggml-ci
2025-06-18 12:40:34 +03:00
Daniel Bevenius
f3ff80ea8d
examples : set the C++ standard to C++17 for server (#3261)
This commit updates the server example to use C++17 as the standard.

The motivation for this change is that currently the ci-run
`ggml-100-mac-m4` is failing when compiling the server example on
macOS. The `talk-llama` example also has this setting so it looks like
an alright change to make.

ggml-ci

Refs: https://github.com/ggml-org/ci/tree/results/whisper.cpp/2a/4d6db7d90899aff3d58d70996916968e4e0d27/ggml-100-mac-m4
2025-06-17 11:29:48 +02:00
w1redch4d
2a4d6db7d9
examples : update usage/help in yt-wsp.sh (#3251)
This commit updates the usage/help message to be more readable and include the environment variables available to set options.
2025-06-16 12:21:16 +02:00
Sacha Arbonel
107c303e69
server : graceful shutdown, atomic server state, and health endpoint Improvements (#3243)
* feat(server): implement graceful shutdown and server state management

* refactor(server): use lambda capture by reference in server.cpp
2025-06-16 10:14:26 +02:00
Daniel Bevenius
0a4d85cf8a
server : add Voice Activity Detection (VAD) support (#3246)
* server : add Voice Activity Detection (VAD) support

This commit adds support for Voice Activity Detection (VAD) in the
server example.

The motivation for this is to enable VAD processing when using
whisper-server.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3089

* server : add VAD parameters to usage in README.md [no ci]

This commit also adds a few missing parameters.

* server : fix conflicting short options [no ci]
2025-06-13 13:24:03 +02:00
Daniel Bevenius
9df8d54bcb
cli : fix short name conflict for vad options [no ci] (#3247)
This commit fixes a short name conflict whisper-cli for
`--vad-min-speech-duration-ms` and `--vad-min-silence-duration-ms` which
currently have the same short name `-vsd`.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3246#pullrequestreview-2923800114
2025-06-13 10:25:25 +02:00
Georgi Gerganov
962361bd79 android : fix builds (#0)
ggml-ci
2025-06-10 12:40:33 +03:00
Georgi Gerganov
db264d6220 talk-llama : sync llama.cpp
ggml-ci
2025-06-10 12:40:33 +03:00
Daniel Bevenius
b505539670
node : add language detection support (#3190)
This commit add support for language detection in the Whisper Node.js
addon example. It also updates the node addon to return an object
instead of an array as the results.

The motivation for this change is to enable the inclusion of the
detected language in the result, in addition to the transcription
segments.

For example, when using the `detect_language` option, the result will
now be:
```console
{ language: 'en' }
```

And if the `language` option is set to "auto", it will also return:
```console
{
  language: 'en',
  transcription: [
    [
      '00:00:00.000',
      '00:00:07.600',
      ' And so my fellow Americans, ask not what your country can do for you,'
    ],
    [
      '00:00:07.600',
      '00:00:10.600',
      ' ask what you can do for your country.'
    ]
  ]
}
```
2025-06-02 14:58:05 +02:00
Georgi Gerganov
7fd6fa8097 talk-llama : sync llama.cpp
ggml-ci
2025-06-01 15:14:44 +03:00
Daniel Bevenius
73a8c5fb94
whisper : remove whisper_load_backends function (#3196)
* whisper : remove whisper_load_backends function

This commit removes the `whisper_load_backends` function, which was used
to load all GGML backends.

The motivation for this change push the responsibility of loading
backends to user applications to give them more control over which
backends to load and when. See the references below for more context.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3182
Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801778733
Refs: https://github.com/ggml-org/whisper.cpp/pull/3042#issuecomment-2801928990

* ruby : add check for rwc is NULL

This commit adds a check to ensure that the `rwc` pointer is not NULL
before attempting to mark its members in the garbage collector.

The motivation for this is an attempt to see if this fixed the CI build
as I'm not able to reproduce the issue locally.

Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/15299612277/job/43036694928?pr=3196
2025-05-29 08:03:17 +02:00
Georgi Gerganov
26eb48cb08 talk-llama : sync llama.cpp
ggml-ci
2025-05-27 18:03:00 +03:00
Daniel Bevenius
450de0787e
node : enable no_prints to suppress all output (#3189)
This commit enable the node addon to suppress all output, even the
result of the transcription if the no_prints parameter is set to true.

The motivation for this is that for the node addon there is a
fullfilment handler/success callback to process the transcription
result. And it might be useful to be able to disable the printing of
the transcription result to the console, so that the user can handle
the result in their own way.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3176
2025-05-27 05:51:47 +02:00
matteng1
ea9f206f18
talk-llama : fix for swedish umlauts + expose model inference settings in talk-llama.cpp (#3187)
Quick fix for not removing swedish umlauts.

* Update talk-llama.cpp

Expose model inference settings to user instead of hard coding them. Same defaults as previous defaults.

* Update examples/talk-llama/talk-llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-26 07:57:39 +02:00
Sacha Arbonel
78b31ca782
server : Add k6 Load Testing Script (#3175)
* add load testing script and update README for k6 integration
2025-05-22 10:03:04 +02:00
Georgi Gerganov
6b6cf19c65 talk-llama : sync llama.cpp
ggml-ci
2025-05-19 14:58:39 +03:00
Daniel Bevenius
f389d7e3e5
examples : add --print-confidence option to cli (#3150)
* examples : add --print-confidence option to cli

This commit adds a new command-line option `--print-confidence` to the
whisper-cli. When enabled, this option prints the confidence level of each
token in the transcribed text using ANSI formatting codes.

The confidence levels are represented using different styles:
```console
main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence)
```

Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
2025-05-14 19:21:48 +02:00
Daniel Bevenius
3882a099e1
server : add --flash-attn usage output (#3152)
This commit adds the `--flash-attn` option to the usage output of the
server example.

The motivation for this change is that while it is possible to set this
option it is not printed in the usage output.
2025-05-14 15:22:05 +02:00
Georgi Gerganov
f890560575 talk-llama : sync llama.cpp
ggml-ci
2025-05-13 13:59:21 +03:00
Daniel Bevenius
fbad8058c4
examples : add VAD speech segments example (#3147)
This commit adds an example that demonstrates how to use a VAD (Voice
Activity Detection) model to segment an audio file into speech segments.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3144
2025-05-13 12:31:00 +02:00
Daniel Bevenius
b2513a6208
vad : remove shortform for --vad option in cli.cpp (#3145)
This commit removes the shortform for the --vad option in cli.cpp.

The motivation for this is that `-v` is often used for verbose or
version is many tools and this might cause confusion.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3065#issuecomment-2873243334
2025-05-13 06:04:05 +02:00
Tomer Schlesinger
587ea01f55
docs : update README.md for whisper.objc app (#2569) 2025-05-13 06:03:50 +02:00
Daniel Bevenius
e41bc5c61a
vad : add initial Voice Activity Detection (VAD) support (#3065)
* vad : add initial Voice Activity Detection (VAD) support

This commit add support for Voice Activity Detection (VAD). When enabled
this feature will process the audio input and detect speech segments.
This information is then used to reduce the number of samples that need
to be processed by whisper_full.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3003

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-12 16:10:11 +02:00
Daniel Bevenius
186855e38b
cli : print color scheme info for --print-colors (#3141)
This commit adds a description of the color scheme used in the CLI
when the --print-colors option is enabled.

The motivation for this is that it is not immediately clear what the
color scheme is when using the CLI with the --print-colors option.

Example output:
```console
$ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors
...

main: color scheme: red (low confidence), yellow (medium), green (high confidence)

[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
```
The description will not be dispayed if the `--no-prints` options is
set.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3135
2025-05-12 10:43:04 +02:00
Daniel Bevenius
4730950492
examples : update link to Paul Tol's color scheme [no ci] (#3140)
This commit updates the link to Paul Tol's color scheme in the
`examples/common.h` file. The previous link was outdated and
pointed to a non-existent page.
2025-05-12 09:02:06 +02:00
Enes Grahovac
5d4390d281
examples : add HEAPU8 to all of the exported runtime methods (#3134)
This commit adds HEAPU8 to the list of exported methods.

The motivation for this commit is that currently this is causing an error on Window systems where HEAPU8 in undefined, which results in the following error message in the web console:

main.js:1 Uncaught TypeError:
Cannot read properties of undefined (reading 'buffer') at __emval_get_property
(main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at
Object.full_default (eval at craftInvokerFunction (main.js:1:1347011),
<anonymous>:9:10) at whisper.cpp/:647:42

danbev originally fixed this for whisper.wasm, stream.wasm, and command.stream, but the issue still exists on the other examples which I patch in this code.

Resolves: #3059
2025-05-10 06:44:13 +02:00
Daniel Bevenius
9791647653
wasm : add note about worker.js file generation [no ci] (#3133)
This commit updates the documentation for the WASM examples to include a
note about the generation of the `worker.js` file. As of Emscripten
3.1.58 (April 2024), separate worker.js files are no longer generated
and the worker is embedded in the main JS file.

The motivation for this change is to inform users about the new behavior
of Emscripten and why the `worker.js` file may not be present.

Refs: https://github.com/ggml-org/whisper.cpp/issues/3123
2025-05-09 15:42:45 +02:00
Daniel Bevenius
b6f3fa4059
stream.wasm : add HEAPU8 to exported runtime methods (#3130)
* stream.wasm : add HEAPU8 to exported runtime methods

This commit adds HEAPU8 to the list of exported methods for stream.wasm.

The motivation for this is that without it HEAPUD8 will be undefined
and when its 'buffer' attribute is accessed this will cause error as
reported in the referenced issue.

Note that to test this make sure that the web browsers caches is cleared
first.

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3123

* command.wasm : add HEAPU8 to exported runtime methods
2025-05-08 16:58:34 +02:00
Georgi Gerganov
4a512cb153 cli : avoid std::exchange
ggml-ci
2025-05-07 15:39:32 +03:00