Waveform Studio
A self-hosted web application for reviewing and editing audio transcripts. Upload audio and a CSV transcript, then review segments with a synchronized waveform, edit speaker labels, split/merge segments, and push changes to the server.
Features
- Waveform playback with per-segment regions (WaveSurfer.js)
- Edit transcript segments: text, speaker labels, split/merge
- Speaker management: colour coding, display names, voice samples
- Project management: local and server-synced projects
- Token-authenticated Flask API backend
Stack
- Backend: Python / Flask
- Frontend: Vanilla JS (IIFE modules), WaveSurfer.js
Documentation
- User & developer guide: served at
/docswhen the app is running - JS API reference (JSDoc): served at
/docs/developer/— build withnpm run docs - UI style guide:
STYLE_GUIDE.md - Code style guide:
CODE_GUIDE.md - Project structure guide: PROJECT_STRUCTURE
Installation
Prerequisites
- Python 3.10+
- Node.js 18+ (for JS tooling — linting, docs, tests, and building the standalone app)
- PostgreSQL (for the cloud/server mode)
- A Firebase project (for authentication in cloud/server mode)
1. Clone the repository
git clone <repo-url>
cd transcriber
2. Install PostgreSQL
Windows (winget):
winget install PostgreSQL.PostgreSQL
Linux (apt):
sudo apt-get install postgresql postgresql-contrib
3. Create and activate a virtual environment
python -m venv .venv
Windows:
.venv\Scripts\activate
Linux / macOS:
source .venv/bin/activate
4. Install Python dependencies
Choose the requirements file that matches your hardware:
# CPU only (no GPU transcription)
pip install -r requirements-cpu.txt
# NVIDIA GPU with CUDA 12.4
pip install -r requirements-gpu.txt
The GPU requirements install CUDA-specific builds of PyTorch, torchaudio, pyannote.audio, and faster-whisper. The CPU builds use standard PyPI wheels for these packages.
5. Install JS dependencies
npm install
This installs ESLint, Vitest, JSDoc tooling, and Husky pre-commit hooks.
6. Create a Modal API key
modal setup
This authenticates the CLI with your Modal account and stores a token locally. If you don't have a Modal account, create one at modal.com first.
7. Set up Firebase
- Go to console.firebase.google.com and create a new project.
- In the Firebase console, go to Authentication → Sign-in method and enable at least one provider (e.g. Email/Password).
- Go to Project Settings → General to find your
FIREBASE_PROJECT_ID,FIREBASE_API_KEY, andFIREBASE_AUTH_DOMAIN. - Go to Project Settings → Service Accounts → Generate new private key to download the service account JSON file. Note its path — you will need it for
FIREBASE_SERVICE_ACCOUNT.
8. Accept HuggingFace model licence agreements
A HuggingFace account and read token are required to download the Pyannote diarization models. If you plan to use transcription:
- Create an account at huggingface.co and generate a read token under Settings → Access Tokens.
- Accept the licence agreements for both models (you must be logged in):
9. Configure environment variables
Copy .env.example to .env and fill in the required values:
cp .env.example .env
| Variable | Required | Description |
|---|---|---|
SECRET_KEY | Yes | Long random string for signing session cookies. Generate with python -c "import secrets; print(secrets.token_hex(32))" |
DATABASE_URL | Yes | PostgreSQL connection string, e.g. postgresql://user:password@localhost:5432/transcriber |
FIREBASE_PROJECT_ID | Yes | Firebase project ID (Firebase console → Project Settings → General) |
FIREBASE_API_KEY | Yes | Firebase web API key |
FIREBASE_AUTH_DOMAIN | Yes | Firebase auth domain, usually <project-id>.firebaseapp.com |
FIREBASE_SERVICE_ACCOUNT | Yes | Path to the Firebase service account JSON key file |
AUTH_PROVIDER | No | Authentication provider. Supported values: firebase, none. Defaults to firebase |
HF_TOKEN | For transcription | HuggingFace read token — required to download Whisper and Pyannote models. Accept licence agreements at the model pages first |
PROJECTS_DIR | No | Directory for uploaded project data. Defaults to data/projects |
DEFAULT_SERVER | No | Default server URL used by the frontend. Defaults to the same host as the frontend |
ALLOWED_ORIGINS | No | Comma-separated CORS origins, or *. Set this when accessing the app from a different host or port |
ALLOW_OPEN_REGISTRATION | No | true to let any Firebase-authenticated user self-register. false to require manual DB activation |
MAX_AUDIO_SIZE_MB | No | Maximum audio upload size in MB. Defaults to 32768 |
MAX_SAMPLE_SIZE_MB | No | Maximum speaker voice sample upload size in MB. Defaults to 10 |
MAX_SAMPLE_DURATION | No | Maximum allowed speaker sample length in seconds. Defaults to 20 |
RATELIMIT_STORAGE_URI | No | Storage backend for rate limiting. Defaults to in-memory (not suitable for multi-process deployments). Use a Redis URL for production, e.g. redis://localhost:6379/0 |
ADMIN_EMAIL | No | Email address to receive admin notifications (e.g. new user registrations). Leave blank to disable |
SMTP_HOST | No | Hostname of your SMTP mail server, e.g. smtp.gmail.com |
SMTP_PORT | No | SMTP port. 587 for STARTTLS (recommended), 465 for SSL. Defaults to 587 |
SMTP_USER | No | SMTP login username — usually your email address or an API key username |
SMTP_PASS | No | SMTP login password or API key |
SMTP_FROM | No | The "From" address on outgoing emails. For Gmail must match SMTP_USER |
FLASK_DEBUG | No | Set to 1 for development mode with auto-reload |
10. Set up the database
Create a PostgreSQL database named transcriber:
createdb transcriber
# or via psql:
psql -U postgres -c "CREATE DATABASE transcriber;"
Then run the schema build scripts:
# Linux / macOS
./database/build.sh
# Windows
database\build.bat
# Custom connection (host, port, user, password)
./database/build.sh -h localhost -p 5432 -U postgres -P yourpassword
This drops and rebuilds all tables in dependency order using the SQL files in database/postgres/.
Running
# Linux / macOS
./run.sh
# Windows
run.bat
Then open http://localhost:5000.
LOCAL_MODE: Standalone Desktop Build
LOCAL_MODE is a configuration flag that produces a fully self-contained, offline version of the app — distributed as a single Windows executable (WaveformStudio.exe) with no cloud dependencies, no login, and no server infrastructure required.
What changes in LOCAL_MODE
| Aspect | LOCAL_MODE | Cloud mode |
|---|---|---|
| Authentication | Implicit single user — no login | Firebase JWT tokens |
| Database | SQLite (local file) | PostgreSQL (remote) |
| Users | One implicit user (LOCAL_USER_ID) | Multiple accounts |
| Sharing | Disabled — all resources owned by implicit user | Public/shared projects supported |
| Preferences | preferences.json file | Database JSONB column |
| Deployment | Single .exe | Flask server |
| Transcription | Not yet wired | Modal cloud service |
How LOCAL_MODE is set
LOCAL_MODE is read from the environment in application/config.py:
LOCAL_MODE = os.getenv('LOCAL_MODE', '0') in ('1', 'true', 'True')
LOCAL_USER_ID = '00000000-0000-0000-0000-000000000001'
The standalone launcher (local_launcher.py) forces LOCAL_MODE=1 at startup, so end users never need to set it manually. The app also reads a .env file from the same directory as the executable, allowing optional user configuration.
Startup flow (local_launcher.py)
- Path resolution — detects whether it is running from a PyInstaller bundle (
sys._MEIPASS) or a dev environment, then resolves paths to templates, static files, and application code accordingly. - Environment setup — loads
.envfrom the executable directory, then forcesLOCAL_MODE=1andPROJECTS_DIR=~/Documents/WaveformStudio/projects. - Database initialisation — creates (or migrates) a SQLite database at
~/Documents/WaveformStudio/waveform_studio.dbusingdatabase/build_sqlite.py, then seeds the implicit local user. - Flask setup — configures the Flask app's template and static folders from the bundle paths.
- Launch — starts a Waitress WSGI server on a random free port on
127.0.0.1, then opens a nativepywebviewwindow (1400×900, min800×600) pointed at/app.
How the code splits on LOCAL_MODE
Authentication (application/auth.py) — the login_required decorator short-circuits Firebase token verification and injects the implicit local user directly:
if LOCAL_MODE:
user = users_mod.get_user(LOCAL_USER_ID)
g.current_user = user
return f(*args, **kwargs)
Permissions (application/permissions.py) — the implicit user is always treated as 'owner' of every resource:
if LOCAL_MODE:
return 'owner'
Database routing (application/db_access/db.py) — selects the correct backend at import time:
if LOCAL_MODE:
from application.db_access.db_sqlite import get_conn, init_sqlite_db
else:
from application.db_access.db_postgres import get_conn
db_sqlite.py is a compatibility layer that translates PostgreSQL syntax (e.g. %s placeholders, RETURNING, array types) to SQLite equivalents.
Sharing (application/projects.py) — shared-project queries return an empty list:
if LOCAL_MODE:
return []
User preferences (application/users.py) — preferences are read from and written to ~/Documents/WaveformStudio/preferences.json rather than the database.
Frontend (static/js/utilities/constants.js) — reads a <meta name="local-mode"> tag injected by the server and exports a LOCAL_MODE boolean. UI components (account page, settings) use this to hide cloud-only features such as logout, subscription management, and account settings.
Data storage
All user data is stored in ~/Documents/WaveformStudio/:
| Item | Path |
|---|---|
| Database | waveform_studio.db |
| Project files | projects/<project-id>/ |
| Preferences | preferences.json |
Building the executable
The build uses PyInstaller with local.spec:
pyinstaller local.spec
local.spec bundles templates, static assets, application code, and the SQLite schema into a single-file executable. Cloud-only packages (modal, firebase_admin, torch, etc.) are explicitly excluded to keep the binary small.
The output is written to build/local/WaveformStudio.exe.