Waveform Studio

A self-hosted web application for reviewing and editing audio transcripts. Upload audio and a CSV transcript, then review segments with a synchronized waveform, edit speaker labels, split/merge segments, and push changes to the server.

Features

  • Waveform playback with per-segment regions (WaveSurfer.js)
  • Edit transcript segments: text, speaker labels, split/merge
  • Speaker management: colour coding, display names, voice samples
  • Project management: local and server-synced projects
  • Token-authenticated Flask API backend

Stack

  • Backend: Python / Flask
  • Frontend: Vanilla JS (IIFE modules), WaveSurfer.js

Documentation

  • User & developer guide: served at /docs when the app is running
  • JS API reference (JSDoc): served at /docs/developer/ — build with npm run docs
  • UI style guide: STYLE_GUIDE.md
  • Code style guide: CODE_GUIDE.md
  • Project structure guide: PROJECT_STRUCTURE

Installation

Prerequisites

  • Python 3.10+
  • Node.js 18+ (for JS tooling — linting, docs, tests, and building the standalone app)
  • PostgreSQL (for the cloud/server mode)
  • A Firebase project (for authentication in cloud/server mode)

1. Clone the repository

git clone <repo-url>
cd transcriber

2. Install PostgreSQL

Windows (winget):

winget install PostgreSQL.PostgreSQL

Linux (apt):

sudo apt-get install postgresql postgresql-contrib

3. Create and activate a virtual environment

python -m venv .venv

Windows:

.venv\Scripts\activate

Linux / macOS:

source .venv/bin/activate

4. Install Python dependencies

Choose the requirements file that matches your hardware:

# CPU only (no GPU transcription)
pip install -r requirements-cpu.txt

# NVIDIA GPU with CUDA 12.4
pip install -r requirements-gpu.txt

The GPU requirements install CUDA-specific builds of PyTorch, torchaudio, pyannote.audio, and faster-whisper. The CPU builds use standard PyPI wheels for these packages.

5. Install JS dependencies

npm install

This installs ESLint, Vitest, JSDoc tooling, and Husky pre-commit hooks.

6. Create a Modal API key

modal setup

This authenticates the CLI with your Modal account and stores a token locally. If you don't have a Modal account, create one at modal.com first.

7. Set up Firebase

  1. Go to console.firebase.google.com and create a new project.
  2. In the Firebase console, go to Authentication → Sign-in method and enable at least one provider (e.g. Email/Password).
  3. Go to Project Settings → General to find your FIREBASE_PROJECT_ID, FIREBASE_API_KEY, and FIREBASE_AUTH_DOMAIN.
  4. Go to Project Settings → Service Accounts → Generate new private key to download the service account JSON file. Note its path — you will need it for FIREBASE_SERVICE_ACCOUNT.

8. Accept HuggingFace model licence agreements

A HuggingFace account and read token are required to download the Pyannote diarization models. If you plan to use transcription:

  1. Create an account at huggingface.co and generate a read token under Settings → Access Tokens.
  2. Accept the licence agreements for both models (you must be logged in):

9. Configure environment variables

Copy .env.example to .env and fill in the required values:

cp .env.example .env
VariableRequiredDescription
SECRET_KEYYesLong random string for signing session cookies. Generate with python -c "import secrets; print(secrets.token_hex(32))"
DATABASE_URLYesPostgreSQL connection string, e.g. postgresql://user:password@localhost:5432/transcriber
FIREBASE_PROJECT_IDYesFirebase project ID (Firebase console → Project Settings → General)
FIREBASE_API_KEYYesFirebase web API key
FIREBASE_AUTH_DOMAINYesFirebase auth domain, usually <project-id>.firebaseapp.com
FIREBASE_SERVICE_ACCOUNTYesPath to the Firebase service account JSON key file
AUTH_PROVIDERNoAuthentication provider. Supported values: firebase, none. Defaults to firebase
HF_TOKENFor transcriptionHuggingFace read token — required to download Whisper and Pyannote models. Accept licence agreements at the model pages first
PROJECTS_DIRNoDirectory for uploaded project data. Defaults to data/projects
DEFAULT_SERVERNoDefault server URL used by the frontend. Defaults to the same host as the frontend
ALLOWED_ORIGINSNoComma-separated CORS origins, or *. Set this when accessing the app from a different host or port
ALLOW_OPEN_REGISTRATIONNotrue to let any Firebase-authenticated user self-register. false to require manual DB activation
MAX_AUDIO_SIZE_MBNoMaximum audio upload size in MB. Defaults to 32768
MAX_SAMPLE_SIZE_MBNoMaximum speaker voice sample upload size in MB. Defaults to 10
MAX_SAMPLE_DURATIONNoMaximum allowed speaker sample length in seconds. Defaults to 20
RATELIMIT_STORAGE_URINoStorage backend for rate limiting. Defaults to in-memory (not suitable for multi-process deployments). Use a Redis URL for production, e.g. redis://localhost:6379/0
ADMIN_EMAILNoEmail address to receive admin notifications (e.g. new user registrations). Leave blank to disable
SMTP_HOSTNoHostname of your SMTP mail server, e.g. smtp.gmail.com
SMTP_PORTNoSMTP port. 587 for STARTTLS (recommended), 465 for SSL. Defaults to 587
SMTP_USERNoSMTP login username — usually your email address or an API key username
SMTP_PASSNoSMTP login password or API key
SMTP_FROMNoThe "From" address on outgoing emails. For Gmail must match SMTP_USER
FLASK_DEBUGNoSet to 1 for development mode with auto-reload

10. Set up the database

Create a PostgreSQL database named transcriber:

createdb transcriber
# or via psql:
psql -U postgres -c "CREATE DATABASE transcriber;"

Then run the schema build scripts:

# Linux / macOS
./database/build.sh

# Windows
database\build.bat

# Custom connection (host, port, user, password)
./database/build.sh -h localhost -p 5432 -U postgres -P yourpassword

This drops and rebuilds all tables in dependency order using the SQL files in database/postgres/.

Running

# Linux / macOS
./run.sh

# Windows
run.bat

Then open http://localhost:5000.

LOCAL_MODE: Standalone Desktop Build

LOCAL_MODE is a configuration flag that produces a fully self-contained, offline version of the app — distributed as a single Windows executable (WaveformStudio.exe) with no cloud dependencies, no login, and no server infrastructure required.

What changes in LOCAL_MODE

AspectLOCAL_MODECloud mode
AuthenticationImplicit single user — no loginFirebase JWT tokens
DatabaseSQLite (local file)PostgreSQL (remote)
UsersOne implicit user (LOCAL_USER_ID)Multiple accounts
SharingDisabled — all resources owned by implicit userPublic/shared projects supported
Preferencespreferences.json fileDatabase JSONB column
DeploymentSingle .exeFlask server
TranscriptionNot yet wiredModal cloud service

How LOCAL_MODE is set

LOCAL_MODE is read from the environment in application/config.py:

LOCAL_MODE = os.getenv('LOCAL_MODE', '0') in ('1', 'true', 'True')
LOCAL_USER_ID = '00000000-0000-0000-0000-000000000001'

The standalone launcher (local_launcher.py) forces LOCAL_MODE=1 at startup, so end users never need to set it manually. The app also reads a .env file from the same directory as the executable, allowing optional user configuration.

Startup flow (local_launcher.py)

  1. Path resolution — detects whether it is running from a PyInstaller bundle (sys._MEIPASS) or a dev environment, then resolves paths to templates, static files, and application code accordingly.
  2. Environment setup — loads .env from the executable directory, then forces LOCAL_MODE=1 and PROJECTS_DIR=~/Documents/WaveformStudio/projects.
  3. Database initialisation — creates (or migrates) a SQLite database at ~/Documents/WaveformStudio/waveform_studio.db using database/build_sqlite.py, then seeds the implicit local user.
  4. Flask setup — configures the Flask app's template and static folders from the bundle paths.
  5. Launch — starts a Waitress WSGI server on a random free port on 127.0.0.1, then opens a native pywebview window (1400×900, min 800×600) pointed at /app.

How the code splits on LOCAL_MODE

Authentication (application/auth.py) — the login_required decorator short-circuits Firebase token verification and injects the implicit local user directly:

if LOCAL_MODE:
    user = users_mod.get_user(LOCAL_USER_ID)
    g.current_user = user
    return f(*args, **kwargs)

Permissions (application/permissions.py) — the implicit user is always treated as 'owner' of every resource:

if LOCAL_MODE:
    return 'owner'

Database routing (application/db_access/db.py) — selects the correct backend at import time:

if LOCAL_MODE:
    from application.db_access.db_sqlite import get_conn, init_sqlite_db
else:
    from application.db_access.db_postgres import get_conn

db_sqlite.py is a compatibility layer that translates PostgreSQL syntax (e.g. %s placeholders, RETURNING, array types) to SQLite equivalents.

Sharing (application/projects.py) — shared-project queries return an empty list:

if LOCAL_MODE:
    return []

User preferences (application/users.py) — preferences are read from and written to ~/Documents/WaveformStudio/preferences.json rather than the database.

Frontend (static/js/utilities/constants.js) — reads a <meta name="local-mode"> tag injected by the server and exports a LOCAL_MODE boolean. UI components (account page, settings) use this to hide cloud-only features such as logout, subscription management, and account settings.

Data storage

All user data is stored in ~/Documents/WaveformStudio/:

ItemPath
Databasewaveform_studio.db
Project filesprojects/<project-id>/
Preferencespreferences.json

Building the executable

The build uses PyInstaller with local.spec:

pyinstaller local.spec

local.spec bundles templates, static assets, application code, and the SQLite schema into a single-file executable. Cloud-only packages (modal, firebase_admin, torch, etc.) are explicitly excluded to keep the binary small.

The output is written to build/local/WaveformStudio.exe.