ASR Provider Setup
This page covers how to register, obtain credentials, and configure ASR providers in BiBi Keyboard (说点啥).
Before you start
- Open
Settings → ASR Settingsand select your ASR provider. - Cloud providers usually require an
API Key/Access Token. - Local models require downloading/importing model files (first load may take a few seconds).
Security
API keys and access tokens are sensitive. Do not share them publicly. If you suspect leakage, revoke the key/token immediately and create a new one.
Provider overview
| Provider | Type | Streaming | Best for |
|---|---|---|---|
| Volcengine | Cloud | ✅ | Low-latency, real-time streaming |
| SiliconFlow | Cloud | ❌ | Beginner-friendly, low cost |
| DashScope (Alibaba) | Cloud | ✅ | Balanced accuracy and cost |
| Soniox | Cloud | ✅ | Stable streaming, international usage |
| Gemini | Cloud | ❌ | Small usage / file-based recognition |
| ElevenLabs | Cloud | ✅/❌ | High accuracy (model-dependent) |
| OpenAI (compatible) | Cloud | ❌ | OpenAI/compatible file transcription |
| Zhipu GLM | Cloud | ❌ | Simple integration, lower cost |
| Local models (SenseVoice / Paraformer / FunASR Nano / TeleSpeech) | Local | Partial ✅ | Privacy-first, offline usage |
Volcengine
Volcengine (Doubao Voice) has strong Chinese recognition and supports both streaming and non-streaming.
1. Create an app and enable ASR services
- Open the console: https://console.volcengine.com/speech/app?opt=create
- Enable these capabilities:
Streaming Speech Recognition Large ModelAudio File Recognition Large Model (Express)

2. Get APP ID and Access Token
- Open the service page: https://console.volcengine.com/speech/service/10011
- Copy
APP IDandAccess Tokenunder the credential section

3. Configure in BiBi Keyboard
- Open
Settings → ASR Settings - Select Volcengine
- Paste
APP IDintoX-Api-App-Key - Paste
Access TokenintoX-Api-Access-Key - If you want streaming, enable “Use Streaming (WebSocket)”

Note
If you enabled both streaming and audio-file recognition when creating the app, they share the same credentials.
SiliconFlow
SiliconFlow provides a built-in free ASR option (no key required) and paid models (own key).
Quick start (no API key required)
- In
Settings → ASR Settings, select SiliconFlow - Keep the “Free ASR” toggles enabled
- Switch between the free models (e.g.
FunAudioLLM/SenseVoiceSmall,TeleAI/TeleSpeechASR) as needed
Use your own API key (optional)
- Sign up / log in: https://cloud.siliconflow.cn/
- Create an API key in the console
- Paste it into the SiliconFlow section in BiBi Keyboard
![]()
DashScope (Alibaba Bailian / Qwen)
DashScope offers good accuracy and cost efficiency, with partial streaming support.
1. Create an API key
- Open: https://bailian.console.aliyun.com/?tab=model#/api-key
- Create and copy an API key

2. Configure in BiBi Keyboard
- Open
Settings → ASR Settingsand select DashScope - Paste the API key and save
Soniox
Soniox supports both streaming and non-streaming.
- Log in: https://console.soniox.com
- In your project, go to
API keys - Create and copy the API key, then paste it into BiBi Keyboard

Gemini
Gemini is commonly used for file-based recognition and small usage.
- Open: https://aistudio.google.com/api-keys
- Create and copy a key
- Paste it into the Gemini section in BiBi Keyboard

ElevenLabs
ElevenLabs scribe_v1 is non-streaming only; scribe_v2 is streaming only.
- Open: https://elevenlabs.io/app/settings/api-keys
- Create an API key
- Enable
Speech to Textpermission for the key


OpenAI (compatible endpoints)
The OpenAI provider supports OpenAI-format transcription endpoints (and compatible third-party endpoints).
- In
Settings → ASR Settings, select OpenAI - Fill in:
ASR Endpoint(e.g.https://api.openai.com/v1/audio/transcriptionsor a compatible endpoint)API Key(Bearer)Model name(e.g.gpt-4o-mini-transcribe/whisper-1)

Zhipu GLM
Zhipu GLM is simple to integrate and usually used as non-streaming.
- Get an API key: https://bigmodel.cn/usercenter/proj-mgmt/apikeys
- Paste it into the Zhipu section in BiBi Keyboard
Local model setup
Local models are ideal for offline usage and privacy. Each model trades off speed, quality, and streaming support.
Model selection tips
- SenseVoice: non-streaming; fast and balanced; supports language settings
- FunASR Nano: non-streaming; slower but often higher quality
- Paraformer: streaming supported; decent quality
- TeleSpeech: non-streaming; slightly better dialect support
Download in-app (recommended)
- Select a local provider (e.g. SenseVoice / Paraformer)
- In the model manager, choose a variant and download
- If notification permission is granted, you can track download/unzip progress in notifications

Direct download links (GitHub Releases)
Direct links
The links below point to BiBi-Keyboard model ZIPs. If any link returns 404, use the models page:
SenseVoice (non-streaming)
- small-int8 (~153MB): sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17.zip
- small-fp32 (~980MB): sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17.zip
Paraformer (streaming)
- Trilingual (ZH/Cantonese/EN, ~974MB): sherpa-onnx-streaming-paraformer-trilingual-zh-cantonese-en.zip
- Bilingual (ZH/EN, ~973MB): sherpa-onnx-streaming-paraformer-bilingual-zh-en.zip
TeleSpeech (non-streaming)
- int8 (~180MB): sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04.zip
- fp32 (~715MB): sherpa-onnx-telespeech-ctc-zh-2024-06-04.zip
FunASR Nano (non-streaming)
- int8 (~690MB): sherpa-onnx-funasr-nano-int8-2025-12-30.zip