Skip to content

ASR Provider Setup

This page covers how to register, obtain credentials, and configure ASR providers in BiBi Keyboard (说点啥).

Before you start

  • Open Settings → ASR Settings and select your ASR provider.
  • Cloud providers usually require an API Key / Access Token.
  • Local models require downloading/importing model files (first load may take a few seconds).

Security

API keys and access tokens are sensitive. Do not share them publicly. If you suspect leakage, revoke the key/token immediately and create a new one.

Provider overview

ProviderTypeStreamingBest for
VolcengineCloudLow-latency, real-time streaming
SiliconFlowCloudBeginner-friendly, low cost
DashScope (Alibaba)CloudBalanced accuracy and cost
SonioxCloudStable streaming, international usage
GeminiCloudSmall usage / file-based recognition
ElevenLabsCloud✅/❌High accuracy (model-dependent)
OpenAI (compatible)Cloud✅/❌OpenAI/compatible file or Realtime transcription
StepAudioCloudStepFun online ASR, Chinese/English and ITN
Zhipu GLMCloudSimple integration, lower cost
Local models (SenseVoice / FunASR Nano / Qwen3-ASR / Parakeet / FireRedASR V2 / Paraformer)LocalPartial ✅Privacy-first, offline usage

Volcengine

Volcengine (Doubao Voice) has strong Chinese recognition and supports both streaming and non-streaming.

1. Create an app and enable ASR services

  1. Open the console: https://console.volcengine.com/speech/app?opt=create
  2. Enable these capabilities:
    • Streaming Speech Recognition Large Model
    • Audio File Recognition Large Model (Express)

Create app and enable capabilities

2. Get APP ID and Access Token

  1. Open the service page: https://console.volcengine.com/speech/service/10011
  2. Copy APP ID and Access Token under the credential section

APP ID and Access Token

3. Configure in BiBi Keyboard

  1. Open Settings → ASR Settings
  2. Select Volcengine
  3. Paste APP ID into X-Api-App-Key
  4. Paste Access Token into X-Api-Access-Key
  5. If you want streaming, enable “Use Streaming (WebSocket)”

Configure Volcengine in app

Note

If you enabled both streaming and audio-file recognition when creating the app, they share the same credentials.

SiliconFlow

SiliconFlow provides a built-in free ASR option (no key required) and paid models (own key).

Quick start (no API key required)

  1. In Settings → ASR Settings, select SiliconFlow
  2. Keep the “Free ASR” toggles enabled
  3. Switch between the free models (e.g. FunAudioLLM/SenseVoiceSmall, TeleAI/TeleSpeechASR) as needed

Use your own API key (optional)

  1. Sign up / log in: https://cloud.siliconflow.cn/
  2. Create an API key in the console
  3. Paste it into the SiliconFlow section in BiBi Keyboard

SiliconFlow API key

DashScope (Alibaba Bailian / Qwen)

DashScope offers good accuracy and cost efficiency, with partial streaming support.

1. Create an API key

  1. Open: https://bailian.console.aliyun.com/?tab=model#/api-key
  2. Create and copy an API key

DashScope API key

2. Configure in BiBi Keyboard

  1. Open Settings → ASR Settings and select DashScope
  2. Paste the API key and save
  3. Choose a model as needed: Qwen3-ASR-Flash, Qwen3.5-Omni-Flash, Qwen3.5-Omni-Plus, or a streaming model

Model choice

Qwen3.5-Omni is for non-streaming multimodal transcription. Streaming defaults to qwen3-asr-flash-realtime-2026-02-10, and you can also switch to fun-asr-realtime.

Soniox

Soniox supports both streaming and non-streaming.

  1. Log in: https://console.soniox.com
  2. In your project, go to API keys
  3. Create and copy the API key, then paste it into BiBi Keyboard

Soniox API keys

Gemini

Gemini is commonly used for file-based recognition and small usage.

  1. Open: https://aistudio.google.com/api-keys
  2. Create and copy a key
  3. Paste it into the Gemini section in BiBi Keyboard

Gemini API key

ElevenLabs

ElevenLabs scribe_v1 is non-streaming only; scribe_v2 is streaming only.

  1. Open: https://elevenlabs.io/app/settings/api-keys
  2. Create an API key
  3. Enable Speech to Text permission for the key

Create ElevenLabs keyEnable Speech to Text permission

OpenAI (compatible endpoints)

The OpenAI provider supports OpenAI-format transcription endpoints, plus compatible third-party Audio Transcriptions or Realtime endpoints.

  1. In Settings → ASR Settings, select OpenAI
  2. Add one or more OpenAI ASR channels to separate official endpoints, proxy endpoints, or different models
  3. Fill in:
    • ASR Endpoint (e.g. https://api.openai.com/v1/audio/transcriptions or a compatible endpoint)
    • API Key (Bearer)
    • Model name (e.g. gpt-4o-mini-transcribe / whisper-1)
  4. If the endpoint supports the Realtime API, enable "Streaming (Realtime)" for live partial results

OpenAI settings example

StepAudio

StepAudio is StepFun's online ASR service. In BiBi Keyboard it is currently used in non-streaming mode.

  1. Create an API key in the StepFun console: https://platform.stepfun.com/
  2. In Settings → ASR Settings, select StepAudio
  3. Paste the StepFun API Key
  4. Choose language (Chinese / English / Auto) and enable ITN if needed

Zhipu GLM

Zhipu GLM is simple to integrate and usually used as non-streaming.

  1. Get an API key: https://bigmodel.cn/usercenter/proj-mgmt/apikeys
  2. Paste it into the Zhipu section in BiBi Keyboard

Local model setup

Local models are ideal for offline usage and privacy. Each model trades off speed, quality, and streaming support.

Model selection tips

  • SenseVoice: non-streaming; fast and balanced; supports language settings
  • FunASR Nano: non-streaming; language selection, native ITN, and MLT Nano multilingual variant
  • Qwen3-ASR: non-streaming; local 0.6B model, good Chinese recognition, optional rule-based ITN
  • Parakeet: non-streaming; V3 for several European languages, V2 for English
  • FireRedASR V2: non-streaming / pseudo-streaming; replaces the old TeleSpeech local engine
  • Paraformer: streaming supported; decent quality
  1. Select a local provider (e.g. SenseVoice / Paraformer)
  2. In the model manager, choose a variant and download
  3. If notification permission is granted, you can track download/unzip progress in notifications

Download local models in-app

Import from local files (optional)

If you prefer adding models from local files, download the ZIP first, then choose "Import from local" in the model manager.

Direct links

The links below point to BiBi-Keyboard model ZIPs. If you see 404 or slow downloads, use the models page (Releases: models) or a GitHub mirror site.

SenseVoice (non-streaming)

Paraformer (streaming)

FireRedASR V2 (non-streaming / pseudo-streaming)

FunASR Nano (non-streaming)

Qwen3-ASR (non-streaming)

Parakeet (non-streaming)

Universal punctuation model (optional)

Released under the Apache 2.0 License.