ASR Provider Setup
This page covers how to register, obtain credentials, and configure ASR providers in BiBi Keyboard (说点啥).
Before you start
- Open
Settings → ASR Settingsand select your ASR provider. - Cloud providers usually require an
API Key/Access Token. - Local models require downloading/importing model files (first load may take a few seconds).
Security
API keys and access tokens are sensitive. Do not share them publicly. If you suspect leakage, revoke the key/token immediately and create a new one.
Provider overview
| Provider | Type | Streaming | Best for |
|---|---|---|---|
| Volcengine | Cloud | ✅ | Low-latency, real-time streaming |
| SiliconFlow | Cloud | ❌ | Beginner-friendly, low cost |
| DashScope (Alibaba) | Cloud | ✅ | Balanced accuracy and cost |
| Soniox | Cloud | ✅ | Stable streaming, international usage |
| Gemini | Cloud | ❌ | Small usage / file-based recognition |
| ElevenLabs | Cloud | ✅/❌ | High accuracy (model-dependent) |
| OpenAI (compatible) | Cloud | ✅/❌ | OpenAI/compatible file or Realtime transcription |
| StepAudio | Cloud | ❌ | StepFun online ASR, Chinese/English and ITN |
| Zhipu GLM | Cloud | ❌ | Simple integration, lower cost |
| Local models (SenseVoice / FunASR Nano / Qwen3-ASR / Parakeet / FireRedASR V2 / Paraformer) | Local | Partial ✅ | Privacy-first, offline usage |
Volcengine
Volcengine (Doubao Voice) has strong Chinese recognition and supports both streaming and non-streaming.
1. Create an app and enable ASR services
- Open the console: https://console.volcengine.com/speech/app?opt=create
- Enable these capabilities:
Streaming Speech Recognition Large ModelAudio File Recognition Large Model (Express)

2. Get APP ID and Access Token
- Open the service page: https://console.volcengine.com/speech/service/10011
- Copy
APP IDandAccess Tokenunder the credential section

3. Configure in BiBi Keyboard
- Open
Settings → ASR Settings - Select Volcengine
- Paste
APP IDintoX-Api-App-Key - Paste
Access TokenintoX-Api-Access-Key - If you want streaming, enable “Use Streaming (WebSocket)”

Note
If you enabled both streaming and audio-file recognition when creating the app, they share the same credentials.
SiliconFlow
SiliconFlow provides a built-in free ASR option (no key required) and paid models (own key).
Quick start (no API key required)
- In
Settings → ASR Settings, select SiliconFlow - Keep the “Free ASR” toggles enabled
- Switch between the free models (e.g.
FunAudioLLM/SenseVoiceSmall,TeleAI/TeleSpeechASR) as needed
Use your own API key (optional)
- Sign up / log in: https://cloud.siliconflow.cn/
- Create an API key in the console
- Paste it into the SiliconFlow section in BiBi Keyboard
![]()
DashScope (Alibaba Bailian / Qwen)
DashScope offers good accuracy and cost efficiency, with partial streaming support.
1. Create an API key
- Open: https://bailian.console.aliyun.com/?tab=model#/api-key
- Create and copy an API key

2. Configure in BiBi Keyboard
- Open
Settings → ASR Settingsand select DashScope - Paste the API key and save
- Choose a model as needed:
Qwen3-ASR-Flash,Qwen3.5-Omni-Flash,Qwen3.5-Omni-Plus, or a streaming model
Model choice
Qwen3.5-Omni is for non-streaming multimodal transcription. Streaming defaults to qwen3-asr-flash-realtime-2026-02-10, and you can also switch to fun-asr-realtime.
Soniox
Soniox supports both streaming and non-streaming.
- Log in: https://console.soniox.com
- In your project, go to
API keys - Create and copy the API key, then paste it into BiBi Keyboard

Gemini
Gemini is commonly used for file-based recognition and small usage.
- Open: https://aistudio.google.com/api-keys
- Create and copy a key
- Paste it into the Gemini section in BiBi Keyboard

ElevenLabs
ElevenLabs scribe_v1 is non-streaming only; scribe_v2 is streaming only.
- Open: https://elevenlabs.io/app/settings/api-keys
- Create an API key
- Enable
Speech to Textpermission for the key


OpenAI (compatible endpoints)
The OpenAI provider supports OpenAI-format transcription endpoints, plus compatible third-party Audio Transcriptions or Realtime endpoints.
- In
Settings → ASR Settings, select OpenAI - Add one or more OpenAI ASR channels to separate official endpoints, proxy endpoints, or different models
- Fill in:
ASR Endpoint(e.g.https://api.openai.com/v1/audio/transcriptionsor a compatible endpoint)API Key(Bearer)Model name(e.g.gpt-4o-mini-transcribe/whisper-1)
- If the endpoint supports the Realtime API, enable "Streaming (Realtime)" for live partial results

StepAudio
StepAudio is StepFun's online ASR service. In BiBi Keyboard it is currently used in non-streaming mode.
- Create an API key in the StepFun console: https://platform.stepfun.com/
- In
Settings → ASR Settings, select StepAudio - Paste the
StepFun API Key - Choose language (Chinese / English / Auto) and enable ITN if needed
Zhipu GLM
Zhipu GLM is simple to integrate and usually used as non-streaming.
- Get an API key: https://bigmodel.cn/usercenter/proj-mgmt/apikeys
- Paste it into the Zhipu section in BiBi Keyboard
Local model setup
Local models are ideal for offline usage and privacy. Each model trades off speed, quality, and streaming support.
Model selection tips
- SenseVoice: non-streaming; fast and balanced; supports language settings
- FunASR Nano: non-streaming; language selection, native ITN, and MLT Nano multilingual variant
- Qwen3-ASR: non-streaming; local 0.6B model, good Chinese recognition, optional rule-based ITN
- Parakeet: non-streaming; V3 for several European languages, V2 for English
- FireRedASR V2: non-streaming / pseudo-streaming; replaces the old TeleSpeech local engine
- Paraformer: streaming supported; decent quality
Download in-app (recommended)
- Select a local provider (e.g. SenseVoice / Paraformer)
- In the model manager, choose a variant and download
- If notification permission is granted, you can track download/unzip progress in notifications

Import from local files (optional)
If you prefer adding models from local files, download the ZIP first, then choose "Import from local" in the model manager.
Direct download links (GitHub Releases)
Direct links
The links below point to BiBi-Keyboard model ZIPs. If you see 404 or slow downloads, use the models page (Releases: models) or a GitHub mirror site.
SenseVoice (non-streaming)
- small-int8 (~153MB): sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.zip
- small-fp32 (~980MB): sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.zip
Paraformer (streaming)
- Trilingual (ZH/Cantonese/EN, ~974MB): sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.zip
- Bilingual (ZH/EN, ~973MB): sherpa-onnx-streaming-paraformer-bilingual-zh-en.zip
FireRedASR V2 (non-streaming / pseudo-streaming)
- Zh + En CTC int8 (~740MB): sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25.zip
FunASR Nano (non-streaming)
- int8 (~690MB): sherpa-onnx-funasr-nano-int8-2025-12-30.zip
- MLT Nano int8 (~690MB): sherpa-onnx-funasr-mlt-nano-int8-2026-03-21.zip
Qwen3-ASR (non-streaming)
- 0.6B int8 (~806MB): sherpa-onnx-qwen3-asr-0.6B-int8-2026-03-25.zip
Parakeet (non-streaming)
- 0.6B V3 int8 (~456MB): sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.zip
- 0.6B V2 int8 (~451MB): sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.zip