AzureSpeechCLI 1.1.0

There is a newer version of this package available.
See the version list below for details.
dotnet tool install --global AzureSpeechCLI --version 1.1.0
This package contains a .NET tool you can call from the shell/command line.
dotnet new tool-manifest # if you are setting up this repo
dotnet tool install --local AzureSpeechCLI --version 1.1.0
This package contains a .NET tool you can call from the shell/command line.
#tool dotnet:?package=AzureSpeechCLI&version=1.1.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
nuke :add-package AzureSpeechCLI --version 1.1.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Azure Speech CLI

Unofficial CLI tool for Microsoft Azure Speech Service management - datasets, models, tests, endpoints etc. Useful especially for automation.

Build status


This tool is using Speech Services API v2.0. SDK was generated automatically from the Swagger definition using AutoRest, but a few adjustments had to be made to the generated code.

Until this is refactored, it's not safe to regenerate the SDK with AutoRest.


Go to Releases to download a compiled version for your operating system, or build directly from sources.

CLI is created with .NET Core and builds are currently running for both Windows and MacOS platforms.

Windows Store version is planned.


Before using the tool, you need to set your Speech Service credentials.

speech config set --name Project1 --key ABCD12345 --region northeurope --select

Or shorter version:

speech config set -n Project2 -k ABCD54321 -r westus -s

Both commands store your credentials as configuration set and automatically make these credentials selected (by using the --select parameter). You can have multiple sets and switch between them:

speech config select --name Project1

This can be useful when you work with multiple subscriptions.


If you're not sure what commands and parameters are available, try adding --help to the command you want to use.

For example:

speech --help
speech dataset --help
speech dataset create --help

Entity operations

Every entity supports basic set of operations:

  • create
  • list
  • show
  • delete

When working with a specific entity, ID is usually required:

speech dataset show --id <GUID>


Every create command offers optional --wait flag which makes the CLI block and wait for the create operation to complete (dataset processed, model trained, endpoint provisioned etc.). When new entity is created, it writes corresponding ID to console.

This is useful in automation pipelines when commands are run as individual steps in a complex process.

speech dataset create --name CLI --audio "C:\" --transcript "C:\test.txt" --wait
Uploading acoustic dataset...
Processing [..............]



After setting your subscription key and endpoint you usually start by preparing data. CLI can help by providing the compile command.

speech compile --audio <source folder> --transcript <txt file> --output <target folder> --test-percentage 10

This command expects a folder with all audio samples as WAV files and TXT file with corresponding transcripts.

It creates the output folder, divides data in two sets ("train" and "test") a compresses them into ZIP files. At the end you will get:

  • train.txt
  • test.txt


There are two types of datasets in the Speech Service: acoustic and language.

To create the acoustic dataset, you need to provide a ZIP file with all audio samples and TXT file with corresponding transcriptions.

To create the language dataset, you need to provide TXT file with language data.

To create an acoustic dataset use:

dataset create --name CLI --audio "C:\" --transcript "C:\train.txt" --wait

To create a language dataset use:

dataset create --name CLI-Lang --language "C:\language.txt" --wait

To list available datasets:

dataset list

To show details of dataset:

dataset show --id 63f20d88-f531-4af0-bc85-58e0e9dAAACCDD


Similarly to datasets there are two types of models in the Speech Service: acoustic and language. Both are created from previously uploaded datasets.

To create an acoustic model you first need to get GUID of base model (referred to as scenario):

model list-scenarios --locale en-us

en-us is the default locale, but you can choose a different one.


Getting scenarios...
d36f6c4b-8f75-41d1-b126-c38e46a059af    Unified V3 EMBR - ULM
c7a69da3-27de-4a4b-ab75-b6716f6321e5    V2.5 Conversational (AM/LM adapt)
a1f8db59-40ff-4f0e-b011-37629c3a1a53    V2.0 Conversational (AM/LM adapt) - Deprecated
cc7826ac-5355-471d-9bc6-a54673d06e45    V1.0 Conversational (AM/LM adapt) - Deprecated
a3d8aab9-6f36-44cd-9904-b37389ce2bfa    V1.0 Interactive (AM/LM adapt) - Deprecated

Then you can use GUID of selected scenario in the create command:

model create --name CLI --locale en-us --audio-dataset <GUID> --scenario c7a69da3-27de-4a4b-ab75-b6716f6321e5 --wait

To create a language model you need the same scenario GUID and then call:

model create --name CLI-Lang --locale en-us --language-dataset <GUID> --scenario c7a69da3-27de-4a4b-ab75-b6716f6321e5 --wait


To create an accuracy test you need three GUIDs: testing audio dataset ID, ID of the acoustic model you are testing and ID of a language model:

speech test create --name CLI --audio-dataset <GUID> --model <GUID> --language-model <GUID> --wait

To see the detail of particular test, call:

speech test list
speech test show --id <GUID>


And finally, to be able to use the model, you need to create an endpoint.

To create an endpoint use:

speech endpoint create --name CLI --locale en-us --model <GUID> --language-model <GUID> --concurrent-recognitions 1 --wait

Batch transcriptions

A bonus command, which doesn't revolve around entities. Batch transcription generates a transcript of long audio file with timestamps, using your custom model.

speech transcript create --name CLI --locale en-us --recording <URL> --model <GUID> --language <GUID> --wait

Once the batch is done, you can call:

speech transcript show --id <GUID>

And get result URLs from response JSON.

Or you can call:

speech transcript download --id <GUID> --out-dir <PATH> --format VTT

To download transcriptions and convert them to VTT (default is JSON).


  • Work with names too, not just GUIDs
  • Rework how configuration is initialized and checked on startup
  • Check if uploaded files are in the correct format (UTF-8 BOM text files)
  • Publish to Windows Store too
  • Add unit tests 😃

By participating in this project, you agree to abide by the Microsoft Open Source Code of Conduct.

This package has no dependencies.

Version Downloads Last updated
1.5.2 1,887 7/7/2019
1.5.0 653 6/14/2019
1.4.1 321 5/29/2019
1.4.0 336 4/25/2019
1.3.1 359 2/14/2019
1.2.2 353 2/12/2019
1.2.1 349 2/7/2019
1.1.0 392 1/17/2019