Documentation — v0.3.8

Waveform Studio

Waveform Studio is a transcription viewer and editor. Load an audio file, sync it with a diarized transcript, and make precise speaker-labeled edits with frame-accurate waveform navigation.

[waveform preview]
Overview

Cloud app vs desktop app

Waveform Studio is available in two forms: a cloud web app and a downloadable desktop app. Both offer the same core editing experience, but they differ in how they are accessed, how projects are stored, and what features are available.

Cloud appDesktop app
Access Browser — no installation required Downloadable Windows executable
Projects Stored in your cloud account Stored on your machine
Transcription Automatic, via server-side Whisper and Pyannote Not yet available — transcripts must be imported as CSV
Account required Yes No
Cost Paid subscription Free

Why a desktop app?

Waveform Studio is open source. The desktop app exists so that anyone can use the full editing environment for free, without a cloud subscription or an internet connection. Rather than running in a browser, the desktop app bundles an internal server that runs locally on your machine.

The desktop app is currently available on Windows only (Windows 10 and Windows 11, 64-bit).

Transcription in the desktop app

Automatic transcription is available in the cloud web app, where it is handled server-side using Whisper and Pyannote. It is not yet available in the desktop app. To work with a transcript in the desktop app you will need to generate one using an external tool and import it as a CSV file with start, end, speaker, and text columns.

Waveform Studio was designed around Whisper (speech-to-text) and Pyannote (speaker diarization), both of which are free and open source and produce output compatible with the import format. Other transcription tools can also be used as long as their output is converted to the expected CSV structure.

01 — Getting Started

Your first project

Waveform Studio is built around the concept of a project — a folder that contains your audio file, transcript data, and any project settings. Follow these steps to create and load your first project.

1
Open the application
Launch Waveform Studio. You will be greeted by the start screen with a New Project and Open Docs button. The project sidebar is on the left.
2
Create a new project
Click + New Project (or + New in the sidebar). A project named Untitled Project is created and added to the sidebar. Click the project name to rename it inline.
3
Drop in your audio file
Drag and drop an audio file (MP3, WAV, FLAC, OGG, M4A, AAC, WebM, or OPUS) onto the waveform panel, or click the drop zone to browse. The file is copied into the project folder.
4
Load a transcript (optional)
If you already have a transcript CSV (with start, end, speaker, and text columns), click + Add CSV in the transcript panel to load it. Otherwise, use the server's Transcribe feature to generate one automatically.
5
Start editing
The workspace loads with the waveform, speakers, and transcript panels ready. Click any segment in the transcript to seek the audio to that point and begin reviewing.
Screenshot — Start screen & new project dialog
Overview

Account

An account is required to use the Waveform Studio cloud web app. The desktop app does not require one.

Creating an account

Waveform Studio is currently in closed beta. Account creation is not open to the public — new accounts require administrator approval before they can be used.

1
Submit a request
Click Sign In and then Create Account. Enter your display name, email address, and a password of at least 8 characters, then submit the form.
2
Wait for approval
Your request is sent to the server administrators. You will receive an email if your account is approved. Until then, signing in will not be possible.
3
Sign in
Once approved, sign in with your email and password to access the app.

Why make an account?

With an account you can:

Create and manage projects
Projects are stored in the cloud under your account and are accessible from any browser.
Use the transcription service
Automatically transcribe and diarize audio using Whisper and Pyannote. Transcription usage and available models depend on your subscription tier.
Share projects
Share projects or folders with other users, and access projects that others have shared with you.
Publish presentation links
Generate a shareable presentation link for any project, optionally making it publicly accessible without requiring sign-in.

Subscription tiers

All accounts start on the Free tier. Paid tiers unlock higher usage limits, more storage, and access to larger Whisper models. Billing can be monthly or yearly; yearly billing includes a 20% discount.

TierPriceTranscription / monthStorageMax audio length
Free $0 5 hours 2 GB 60 min
Starter $4.99 / mo or $47.90 / yr 20 hours 20 GB 240 min
Pro $19.99 / mo or $191.90 / yr 50 hours 50 GB 480 min
Business $39.99 / mo or $383.90 / yr 100 hours 100 GB Unlimited

Whisper model availability also varies by tier — larger, more accurate models are only available on higher tiers. See Transcription for model details. Your current tier and usage can be viewed and changed in the Subscription tab of Account Settings.

Account settings

Open Account Settings by clicking the avatar icon in the top bar of the sidebar. Settings are divided into tabs.

Profile

SettingDescription
Display name The name shown in the app and to other users. Can be changed at any time.
Email The email address associated with your account. Read-only.
Sign-in provider Shows how you authenticate: Email / Password, Google, or GitHub. Read-only.
Member since The date your account was created. Read-only.

Look & Feel

SettingDescription
Theme Choose between Auto (follows your OS or browser preference), Light, and Dark. Custom themes can also be created, edited, imported, and exported as JSON files. Each theme stores its own set of colour overrides.
Colours Fine-grained colour overrides for the current theme, grouped into four categories:

Accent — Primary and secondary accent colours used for highlights, buttons, and interactive elements.

Base — Background, surface layers, borders, body text, and muted text.

Waveform — The unplayed and played portions of the waveform.

Status — Danger (errors, destructive actions), Success (confirmations), and Recording (active recording indicator).

Each colour has a reset button to revert it to the theme default.

App Behavior

SettingDescription
On startup Controls what happens when the app loads.

Open last project (default) — Automatically reopens the project you had open when you last closed the app.

Show home screen — Always opens the home screen, regardless of what was open previously.

Subscription

ItemDescription
Current plan Displays your active tier and billing period (monthly or yearly).
Next payment The date your next billing cycle begins. Not shown for the Free tier.
Transcription usage Hours of transcription used this month versus your monthly allowance.
Storage usage Total storage used by your projects and audio files versus your plan limit.
Upgrade plan Opens the plan selector to upgrade or change your subscription tier.
Cancel subscription Cancels your paid subscription. Only visible on paid tiers.

Account Management

ActionDescription
Delete account Permanently deletes your account and all associated data. This cannot be undone. A confirmation prompt is shown before the action is carried out.
Sign out Signs you out of the app and returns you to the home screen.
02 — Reference

Explanation of terms

There are a number of different terms used throughout the application, and this guide. This is a list of terms and their meanings.

Term Definition
App The entire application, including sidebars and editing tools.
Workspace The viewing and editing suite that you can utilize for a project once open.
Project A collection containing a single transcription with a waveform and speaker definitions. Must be loaded into a workspace to edit.
Waveform A digital representation of an audio signal, typically in the form of a .wav or .mp3 file. They can be visualized by graphing the signal's amplitude over time.
Region A portion of the audio displayed below the waveform. Each region corresponds to a segment. Likewise, the start and end of paragraphs are represented with rounded corners. Speakers are represented by color.
Transport The current time in the waveform which is being played. Represented with a vertical yellow line.
Transcription An approximation of the words spoken within an audio waveform, along with which speaker said them, and when they were said. In practice, a transcription is represented by a series of segments.
Speaker A single voice that has been identified by the transcriber of defined by the user.
Diarization Splitting a transcription into multiple "channels" or speakers.
Segment A small string of text identified during the transcribing process. Has a start time, end time, and speaker.
Paragraph A contiguous string of segments with the same speaker, and no long pauses in between. A paragraph split occurs when the speaker changes, or the current speaker pauses for more than one second.
Speaker Block A contiguous string of paragraphs with the same speaker. Each new speaker block is prefaced with a speaker label.
02 — Interface

The workspace

The workspace is divided into three key areas. Each panel is resizable by dragging the dividers between them.

Waveform Panel
Zoomable audio waveform with a scrubber and playback controls. Colored regions mark each speaker's segments.
Speakers Panel
Lists all detected speakers. Click a speaker to jump to their next segment. Double-click to rename.
Transcript Panel
Full scrollable transcript with speaker labels. Click any segment to seek audio. Edit text inline.
Sidebar
Project list and server connection panel. Drag to reorder, right-click for project options.
04 — Interface

Waveform panel

The waveform panel is the primary audio interface. It displays a zoomable waveform of the loaded audio file, lets you control playback, and shows speaker regions aligned to the transcript. Every component is described below.

[waveform panel demo]

Layout

The panel is divided into two areas stacked vertically:

Waveform area
The main scrollable canvas. From top to bottom it contains the waveform display, the region lane (coloured speaker blocks), and the time ruler (timestamps). The yellow vertical line is the transport — it marks the current playhead position and moves during playback.
Controls bar
The strip below the waveform. Contains playback controls, the timecode display, zoom buttons, the volume slider, and the speed selector.

Below the controls bar is the minimap — a full-width thumbnail of the entire waveform. The shaded rectangle shows which portion of the audio is currently visible. Drag the rectangle to pan, or drag its handles to resize the view.

Seeking

Click anywhere on the waveform display or on a region in the region lane to move the playhead to that position. The transcript panel will scroll to and highlight the corresponding segment.

Use the ↩ 5s and 5s ↪ buttons in the controls bar to skip backward or forward by five seconds. The keyboard equivalents are and .

Playback controls

Play / Pause
Click the ▶ / ⏸ button or press Space to toggle playback.
Volume
Drag the VOL slider to adjust playback volume between 0 and 100%.
Playback speed
Use the Speed dropdown to set the playback rate. Available options are 0.5×, 0.75×, 1.0×, 1.25×, 1.5×, and 2.0×. Slowing the audio down is useful when reviewing difficult passages.
Follow mode
Click ⊙ FOLLOW to keep the playhead centred in the waveform view during playback. When follow mode is off, the waveform stays still and the playhead line moves across it.

Zooming

Zooming in reveals more detail in the waveform and makes precise seeking easier.

Scroll to zoom
Hold Ctrl and scroll the mouse wheel over the waveform. By default this centres the zoom on the playhead.
Zoom buttons
Use the and buttons in the controls bar to step the zoom level in or out. Press ⊡ RESET (or 0) to return to 1×.
Zoom anchor
This button toggles between two zoom anchor modes. When set to CURSOR, zooming centres on the mouse cursor position. When set to TRANSPORT, zooming centres on the playhead. Click the button to switch between them.

Speaker regions

The region lane is the coloured strip directly below the waveform display. It provides a timeline view of every transcript segment laid out to scale against the audio.

Each block in the lane is a region — it maps exactly to one transcript segment. The block's horizontal position and width represent that segment's start and end times, so longer blocks mean longer speech. Regions are colour-coded by speaker: the colour of each block matches the hue assigned to that speaker in the Speakers panel, making it easy to see at a glance who is speaking and when.

Regions are grouped by paragraph. All segments within the same paragraph are joined into a single continuous bar with rounded outer corners and the speaker's name drawn inside it. A small gap separates each paragraph from the next. A speaker block in the transcript may contain multiple paragraphs, so it is possible to see several bars of the same colour in a row — one per paragraph — before the colour changes for a different speaker.

When you zoom in on the waveform, the region lane zooms with it, keeping regions aligned to the audio beneath them. If no transcript has been loaded, the region lane is hidden.

Interacting with regions

Click to seek and select
Clicking a region moves the playhead to that segment's start time and selects the corresponding segment in the transcript panel, scrolling to it if needed.
Double-click to zoom
Double-clicking a region seeks to that segment and zooms the waveform to fit it, giving you a closer view for detailed review or editing.
Right-click for options
Right-clicking a region opens the segment context menu — the same menu available in the transcript panel. From here you can Split the segment, Merge it with an adjacent segment, Change speaker, or Delete the segment entirely.
Region colours update instantly when you change a speaker's hue in the Speakers panel.
05 — Interface

Speakers panel

The speakers panel lists every speaker in the project. Each speaker has a name, a colour, an internal ID, and an optional voice sample. Speakers are listed in the order they first appear in the transcript.

ColumnDescription
Colour swatch A filled circle showing the speaker's assigned hue. Click it to open the hue picker and change the colour.
Name The speaker's display name, rendered in their colour. Click to rename inline. Hovering highlights the speaker's regions in the waveform panel.
ID The speaker's raw internal identifier as assigned by the diarization process (e.g. SPEAKER_00). This is read-only and cannot be changed.
Voice sample Shows a mini waveform and playback controls if a sample has been recorded or uploaded, or buttons to add one if not. See Voice samples below.
Delete (✕) Removes the speaker. If they have segments assigned, a dialog prompts you to reassign those segments to another speaker first. Disabled when only one speaker exists.

Adding speakers

Click + Add Speaker at the top of the panel to create a new speaker. A new row appears with an auto-generated ID (e.g. SPEAKER_04) and the name field is immediately opened for editing so you can type a name right away.

Renaming a speaker

Click a speaker's name to enter inline edit mode. Type the new name, then press Enter or click away to confirm. Press Esc to cancel and restore the previous name. The new name is reflected everywhere in the transcript immediately.

If you clear the name field and confirm, the name resets to the speaker's raw ID.

Changing a speaker's colour

Click the colour swatch to the left of a speaker's name to open the hue picker. Drag the handle around the colour wheel to choose a new hue. The change is applied live — waveform regions, region lane bars, and transcript speaker labels all update instantly.

Highlighting a speaker

Hovering over a speaker's name highlights all of that speaker's regions in the waveform panel, making it easy to see at a glance how much of the audio they occupy and where their turns fall.

Voice samples

Each speaker row has a voice sample cell. A voice sample is a short audio clip that represents a speaker's voice — useful for confirming identity when labelling speakers. When no sample has been set, three buttons are shown for adding one:

The intended purpose of voice samples is to improve speaker diarization accuracy by providing the transcription engine with a reference clip for each speaker. This integration is not yet implemented — samples are currently stored but not passed to the transcription process.
Upload
Opens a file picker to load any audio file from disk as the speaker's voice sample.
⏺ Rec
Records directly from your microphone. Click once to start recording — the button changes to ⏹ Stop. Click again to stop, or recording stops automatically after 20 seconds.
✂ Segment
Opens a picker listing all transcript segments. Click a row to use that segment's audio as the voice sample. You can preview any segment in the picker before selecting it. If you select a segment belonging to a different speaker, a confirmation prompt is shown first.

Once a sample is set, the cell shows a mini waveform visualisation drawn in the speaker's colour. Click to play the sample — click it again to stop early. Click the button to remove the sample.

Deleting a speaker

Click the button at the right of a speaker row to delete that speaker. A confirmation dialog appears. If the speaker has segments assigned to them, the dialog will ask you to choose another speaker to reassign those segments to before deletion proceeds. If the speaker has no assigned segments, you can delete them directly.

You cannot delete a speaker if they are the only one in the project. The delete button is disabled when only one speaker exists.
06 — Interface

Transcript panel

The transcript panel displays the full diarized transcript as readable text, grouped by speaker. It is the primary interface for reviewing, navigating, and editing the content of a transcript.

[transcript panel demo]

Structure

The transcript is organised into three levels:

1
Speaker blocks
The top-level grouping. A speaker block contains all consecutive speech from one speaker. A speaker label (shown in the speaker's colour) appears at the top of each block. Clicking the label enters inline rename mode.
2
Paragraphs
Each speaker block is divided into one or more paragraphs. A coloured handle bar runs down the left edge of each paragraph. See Paragraph handles below for interactions.
3
Segments
The individual time-aligned text units within a paragraph. Each segment is a short span of speech. See Segment interactions below.

Paragraph handles

The coloured bar on the left edge of each paragraph is its handle. Its colour matches the speaker. Interacting with the handle affects the entire paragraph:

Click
Seeks the audio to the start of the paragraph's first segment.
Double-click
Zooms the waveform to fit the entire paragraph in view.
Hover
Highlights the paragraph's corresponding regions in the waveform panel.
Right-click
Opens a speaker picker to reassign all segments in the paragraph to a different speaker at once.

Segment interactions

Click
Selects the segment and seeks the audio playhead to its start time. The corresponding region in the waveform is also highlighted.
Double-click
Selects the segment, zooms the waveform to it, and opens it for inline text editing. See Editing Transcripts for details.
Hover
Highlights the segment's region in the waveform panel.
Right-click (no selection)
Opens the segment context menu with options to Split, Merge with previous, Merge with next, Change speaker, and Delete. Choosing Split segment opens a popup showing the full segment text with a draggable split point — drag the marker to the word where you want the split, then confirm.
Right-click (with text selected)
When text is selected in the transcript, right-clicking shows a selection context menu instead, with options to add a hyperlink to the selection or populate the search bar with it.

Active segment during playback

While audio is playing, the segment corresponding to the current playhead position is highlighted and the transcript scrolls automatically to keep it in view.

Search and replace

A search bar appears at the top of the transcript panel once a transcript is loaded. Type to search — matching segments are highlighted and a match counter shows how many results were found. Use the and arrows to step through matches, or press Enter to advance to the next one. Press Esc or click the clear button to dismiss the search.

Use the speaker filter dropdown to restrict results to a specific speaker.

Click the replace toggle button to expand a replace bar beneath the search field. Enter a replacement term and use Replace to substitute the current match, or Replace All to substitute every match at once. Press Tab to move between the search and replace fields.

Press Shift+F to instantly populate the search bar with the currently selected text in the transcript.

Text selection

You can select text freely within the transcript using click-and-drag. Selections can span multiple segments. Hold Ctrl while dragging to snap the selection boundaries to whole words.

Once text is selected, two actions are available:

Search selected text
Right-click the selection and choose Search text, or press Shift+F, to populate the search bar with the selected text and run a search immediately.
Add a hyperlink
Right-click the selection and choose Add link, or press Shift+K, to open the hyperlink dialog and attach a URL to the selected text. See Editing Transcripts for more on hyperlinks.

Hyperlinks

Any span of text in the transcript can have a hyperlink attached to it. Linked text is displayed with a distinct style so it stands out from the surrounding content.

Adding and editing links

Select the text you want to link, then press Shift+K or right-click and choose Add link. If no text is selected, the link covers the entire segment. To edit an existing link, select or click on the linked text and use the same shortcut — the dialog opens pre-filled with the existing values.

The link dialog contains four fields:

FieldDescription
URL (required) The destination address. As you type or paste a URL the dialog checks it via the server — a spinner appears while checking, then a if the URL is reachable or a if it is not. If the page has a title it will be offered as a suggested display name.
Display name (optional) A short human-readable label for the link, shown in the tooltip when hovering over it.
Description (optional) A longer description of the linked resource, also shown in the tooltip.
Editor notes (optional) Private notes visible only in edit mode. Not shown when the transcript is viewed in read-only or presentation mode.

Behaviour with existing links

When a new link is added over a range that already contains one or more links, conflicts are resolved automatically:

SituationWhat happens
New link fully contains an existing link The existing link is removed and the new link covers the entire range.
New link is fully inside an existing link The existing link is split into two — one covering the text before the new link, one covering the text after. The new link occupies the middle. Both split portions retain the original URL and display name.
New link partially overlaps an existing link The existing link is trimmed so its range ends where the new link begins (or begins where the new link ends). Neither link is deleted — they are made adjacent.
Selection spans multiple segments A confirmation prompt is shown: "This selection spans multiple segments. They will be automatically merged before the link is added." Confirming merges the segments into one, then adds the link. This cannot be undone separately — the merge is permanent.
Editing text in a segment that has a link The link adjusts based on where the edit is made. An edit before the linked text shifts the link's position to follow it. An edit after the linked text leaves the link unchanged. A whole-segment link (one covering the entire segment) expands to cover the new full text after the edit. If the edit directly overlaps the linked text itself, the link is removed.
Leading or trailing whitespace in selection Whitespace at the edges of a selection is automatically trimmed before the link is stored, so the visible highlight never starts or ends on a space.

Following links

Hovering over linked text shows a tooltip with the link's display name, description, URL, and any editor notes (in edit mode). The tooltip appears after a short delay, or immediately if Ctrl is already held.

To follow a link, hold Ctrl and click the linked text. The URL opens in a new tab.

Toolbar buttons

ButtonDescription
↓ Export Opens the export panel. Choose a format (PDF, DOCX, TXT, Markdown, or CSV), configure options, and download the file.
+ Add CSV Loads a transcript from a CSV file. The file must have start, end, speaker, and text columns. If a transcript is already loaded, a confirmation prompt appears first.
Delete transcript Permanently removes the transcript from the project. A confirmation prompt is shown before deletion.
Transcribe Available when audio is loaded. Opens the transcription options dialog where you can choose a Whisper model and speaker count, then submits the job. A progress bar tracks the job in real time; the transcript loads automatically on completion.
Features

Transcription

Waveform Studio can automatically transcribe an audio file and generate a speaker-diarized transcript using Whisper (speech-to-text) and Pyannote (speaker diarization). Transcription runs on cloud GPU infrastructure and is available in the web app only — see Cloud vs Desktop for details.

Starting a transcription job

1
Open a project with audio loaded
The Transcribe button in the transcript panel becomes active once audio has been uploaded to the project and successfully converted to MP3.
2
Click Transcribe
The transcription options dialog opens, showing the audio duration and controls for configuring the job.
3
Configure and confirm
Choose your settings (see below), review the cost and time estimates, then click ◎ Start Transcription.
4
Wait for results
A progress bar and status text stream updates in real time while the job runs. When it completes, the transcript loads into the workspace automatically.

Transcription options

OptionDescription
Whisper model Controls the speech-to-text accuracy and processing speed. Larger models produce more accurate transcripts but take longer and cost more.

Available models, from fastest to slowest: Tiny, Base, Small, Turbo, Medium (recommended), Large.
Diarization model The Pyannote model used to identify and separate speakers. Two variants are available: Speaker Diarization 3.1 and Speaker Diarization Community 1.
Est. speakers An optional hint for how many speakers are in the audio. Set to 0 to let the diarization model detect the number automatically. Providing the correct count can improve accuracy when the number of speakers is known in advance.
Voice samples If any speakers in the project have voice samples attached, this checkbox becomes available. When enabled, the samples are passed to the diarization model to help it identify speakers. See Speakers Panel for how to record or upload voice samples.

Cost and time estimates

The dialog displays an estimated cost and processing time based on your audio duration and chosen models. These are calculated assuming an A10G GPU and will update live as you change the model selections. Actual cost and time may vary.

If a transcript already exists when you start a new job, it will be replaced when the new job completes.

Running transcription yourself

The transcription pipeline code is included in the project source. If you are self-hosting Waveform Studio or want to run transcription jobs independently, you can adapt or invoke this code directly rather than using the hosted cloud service.

See the developer documentation for a full walkthrough: Transcription Pipeline Tutorial.

Features

Presentation mode

Presentation mode is a read-only, shareable view of a project designed for playback and transcript review. It strips away all editing controls and presents the audio player and transcript in a clean, scrollable layout that anyone can use — no account required, depending on your sharing settings.

Presentation mode is available for projects in the web app only.


Opening presentation mode

Click the Present button in the workspace header. This opens the presentation view in a new browser tab at a unique URL for your project. You can copy and share this URL with others using the Copy link button inside the presentation view.

How it differs from the editor

EditorPresentation mode
Full editing tools — segments, speakers, waveform Read-only — no editing of any kind
Requires an account Can be accessed publicly or with authentication (see Access control below)
Waveform, Speakers, and Transcript panels Audio player with region lane, and scrollable transcript
Right-click context menus on segments No context menus — segments are click-to-seek only
Hyperlinks require Ctrl+click to follow Hyperlinks open on a single click

Layout

The presentation view has three main areas:

Header
Displays the Waveform Studio wordmark, the project title, and a Copy link button that copies the presentation URL to your clipboard.
Audio player
Contains the waveform, region lane, playback controls, timecode, volume, and speed selector. The player sticks to the top of the viewport as you scroll through the transcript, so it is always accessible.
Transcript
The full diarized transcript rendered below the player. Each speaker block shows the speaker's name and colour. Click any segment to seek the audio to that point. The active segment (currently playing) is highlighted.

Playback controls

ControlDescription
Play / Pause Starts or pauses audio playback.
« / » Jump to the previous or next paragraph boundary.
‹ / › Jump to the previous or next segment boundary. If playback is more than 0.5 seconds into the current segment, pressing restarts the current segment rather than moving to the previous one.
Volume Drag the volume slider to adjust level. Click the speaker icon to mute or unmute.
Speed Choose a playback rate: 0.5×, 0.75×, 1×, 1.25×, 1.5×, or 2×.

Transcript interaction

The transcript is interactive even in read-only mode:

Click a segment
Seeks the audio to the start of that segment and selects it. The view also scrolls to keep the active segment visible when playback advances.
Click a region in the waveform lane
Seeks to that segment and scrolls the transcript to the corresponding text.
Click a hyperlink
Opens the linked URL in a new tab. Unlike the editor, no Ctrl modifier is needed. Hovering over a hyperlink shows a tooltip with the link name, URL, and description after a short delay.

Access control

Whether viewers need to sign in depends on the project's sharing setting:

Anyone with the link
The presentation is publicly accessible — no account or sign-in required. Project data is embedded directly in the page when it loads.
Signed-in users only
Viewers must sign in with a Google account before the presentation loads. If access is denied (the signed-in account does not have permission), an access denied message is shown instead of the project content.
Features

Live Quotes

A Live Quote is an interactive audio-and-text widget that you can embed on any webpage. It shows a highlighted excerpt from your transcript alongside a playable audio clip — viewers can read the words, click any segment to seek, and play or pause the clip directly in the page, with no account or sign-in required.

Live Quotes are available in the web app only. The source project must be saved to the server.

Opening the dialog

Select any text in the transcript panel, then open the Live Quote dialog using one of two methods:

Right-click the selection
A context menu appears with three options: Add link, Search text, and «» Generate Live Quote. Click the last item to open the dialog.
Keyboard shortcut
With text selected, press ⇧Q to open the dialog immediately.

The dialog has two panels: a configuration panel on the left and a live preview of the final widget on the right.

Choosing which segments to include

The left panel lists every transcript segment that falls within your selection. Each segment has a checkbox. Unchecking a segment removes its text from the displayed quote and replaces it with an ellipsis (…) — but the underlying audio still plays through uninterrupted. This lets you condense a quote without creating jarring audio gaps.

Trimming the boundaries

When the first or last segment in your selection is only partially selected, a trim checkbox appears at that boundary:

OptionEffect
Trim start Prefixes the displayed text with … to signal that the quote begins mid-sentence. The audio clip still starts at the beginning of the first segment.
Trim end Appends … to the displayed text to signal that the quote ends mid-sentence. The audio clip still plays to the end of the last segment.

Embed options

OptionDescription
Speaker attribution Enable to show a speaker name below the quote text, prefixed with an em dash (— Name). Pre-filled with the first speaker's name; editable before generating.
Quote marks Wraps the transcript text in typographic opening and closing quotation marks. On by default.
Waveform Shows a waveform visualization above the playback controls. Displays the audio amplitude for the clip; also clickable for seeking. On by default.
Download button Adds a download icon to the widget footer so viewers can save the audio clip. Off by default.
Font The typeface used for the quote text. Choices: Inter, IBM Plex Sans, IBM Plex Mono, Georgia.
Audio format The format of the generated audio clip: mp3 (broader compatibility) or webm (smaller file size).

Duration limit

Live Quote clips are capped at 60 seconds. A duration bar below the segment list shows the current clip length. As you approach the limit the bar turns amber; if you exceed it the bar turns red and the Create Embed Code button is disabled until you reduce the selection.

Live preview

The right panel shows an exact preview of the widget as it will appear on the page. It updates in real time as you check or uncheck segments and toggle options. You can play the audio clip directly in the preview — clicking any segment seeks to that point, and clicking the waveform or progress bar seeks to that position in the clip.

Generating the embed code

Click Create Embed Code to submit the configuration to the server. The server processes the audio clip and stores the widget data, then returns an <iframe> snippet. The right panel switches to show the snippet in a read-only text area with a Copy to clipboard button.

Paste the snippet into any webpage — a blog post, CMS, or plain HTML file — and the widget will render at 660 × 280 px, scaling down to fit narrower containers automatically.

What viewers see

The embedded widget is fully self-contained and interactive. It includes:

Quote text
The transcript excerpt, with optional quote marks and speaker attribution. As the clip plays, each segment is highlighted in sequence so readers can follow along word by word.
Waveform
A visual amplitude display for the clip. The played portion is highlighted in gold. Click anywhere on the waveform to seek to that position.
Playback controls
A play/pause button, a clickable progress bar, and a timecode display showing elapsed time. Clicking a segment in the transcript also seeks the audio to that point.
Download button (optional)
If enabled, an icon in the widget footer lets viewers download the audio clip directly.
09 — Features

Server integration

When signed in to a Waveform Studio server, your projects are stored in your account and synced automatically. You also gain access to server-side transcription powered by Whisper and Pyannote.

1
Sign in
Click the user avatar in the top-left corner of the sidebar and sign in with your account. Once authenticated, your projects appear in the sidebar.
2
Submit a transcription job
With a project open and an audio file loaded, click Transcribe. Choose your Whisper model and speaker count, then confirm. The audio is sent to the server for processing.
3
Wait for results
Waveform Studio streams progress from the server and displays it in real time. When the job completes, the transcript loads automatically into the workspace.
Screenshot — Signed-in sidebar with project list and transcription progress
10 — Reference

Keyboard shortcuts

Waveform Studio is designed for keyboard-driven workflows. The full shortcut reference is also accessible via the legend at the bottom of the sidebar.

Shortcut Action
Playback
Space
Play / Pause
/
Seek backward / forward 5 seconds
Waveform
Ctrl+scroll
Zoom waveform in / out
+/
Zoom in / out
0
Reset zoom to 1×
Segment editing (requires a selected segment)
E
Edit segment text inline
S
Split segment at chosen word
C
Change speaker (opens context menu)
ShiftA
Merge with previous segment (same speaker)
ShiftD
Merge with next segment (same speaker)
Tab
Select next segment (and enter edit mode)
ShiftTab
Select previous segment (and enter edit mode)
ShiftK
Add or edit a hyperlink on the selected text within a segment
ShiftF
Populate the search bar with the currently selected text
ShiftQ
Generate a Live Quote embed from the selected text
General
CtrlZ
Undo
CtrlY
Redo
Esc
Cancel edit / close popup / clear selection
11 — Reference

UI element reference

A complete reference for every interactive element in Waveform Studio, organized by area of the interface.

Sidebar

ElementDescription
◀ Toggle button Collapses or expands the sidebar. When collapsed, more horizontal space is given to the workspace.
User avatar Displays the current account icon. Clicking it opens account options (sign in / sign out when a server is connected).
⚙ Settings button Opens the application settings panel.
⊙ Open Opens a packaged project from disk. Accepts a .wfs project archive file.
+ New Creates a new, empty project on the server and adds it to the project list.
New Folder Creates a new folder in the current location. Folders can be used to organize projects. Click a folder to navigate into it; use the breadcrumb trail at the top of the list to navigate back up.
Project list Lists your projects, grouped by folder when folders are in use. Click a project to load it into the workspace. Right-click a project for options: Rename, Duplicate, Delete, and Move to folder.
Ⓘ Open Docs Opens this documentation in a new browser tab.
Theme buttons (Auto / ☀ / ☾) Switch between automatic (follows browser/OS preference), light, and dark themes. The selection is persisted across sessions.
Shortcut legend A compact keyboard shortcut reference at the bottom of the sidebar. See Keyboard Shortcuts for the full list.

Title bar

ElementDescription
Project title Displays the current project's name. Click to enter rename mode; press Enter to confirm or Esc to cancel.
Server status indicator Shows whether the open project is synced with the server. Displays a coloured pill: green for synced, amber for unsaved remote changes, grey for local-only.
↓ Save (download icon) Downloads the current project as a .wfs file.
↑ Upload (upload icon) Pushes the current project to the connected server. Only available when a server connection is active.

Waveform panel controls

ElementDescription
+ Add Audio Opens a file-picker to load an audio file into the current project. Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WebM, OPUS.
Drop zone Drag an audio file from your desktop directly onto this area to load it. Disappears after an audio file is loaded.
Track name / metadata row Displays the filename and sample rate of the loaded audio. The coloured dot and status text reflect the current playback state (PLAYING / PAUSED / STOPPED).
Waveform display A zoomable waveform of the loaded audio. The yellow vertical line is the transport. Click anywhere to seek. Scroll with Ctrl + wheel to zoom.
Region lane The coloured bar directly below the waveform. Each coloured block is a region corresponding to a speaker segment. Regions are colour-coded by speaker.
Time ruler Shows timestamps along the bottom of the waveform area. Tick density adjusts automatically with the zoom level.
Minimap A full-width thumbnail of the entire waveform. The shaded thumb shows the current viewport. Drag the thumb or its handles to pan and resize the view.
▶ Play / Pause Toggles audio playback. Keyboard shortcut: Space.
Timecode display Shows the current playhead position and total duration in m:ss.d format.
↩ 5s / 5s ↪ Skip backward or forward by five seconds. Keyboard shortcuts: / .
⊙ FOLLOW When active, the waveform view automatically scrolls to keep the playhead visible during playback.
⊡ RESET Resets the waveform zoom to 1× and re-centers the view.
⌖ CURSOR toggle Toggles the zoom anchor between the cursor position and the playhead. When active (highlighted), zooming recenters on the mouse cursor; when inactive, it recenters on the playhead.
Zoom level indicator Displays the current zoom multiplier (e.g., ).
VOL slider Controls playback volume from 0 to 100%.
Speed selector Sets the playback rate. Available options: 0.5×, 0.75×, 1.0×, 1.25×, 1.5×, 2.0×.
Status bar A two-part strip below the controls. The left side shows the current operation state (e.g., READY, LOADING). The right side shows context-sensitive information such as the hovered timestamp or active segment.

Speakers panel

ElementDescription
+ Add Speaker Creates a new speaker entry with a default name and an auto-assigned colour. The speaker can be renamed immediately after creation.
Color swatch Shows the speaker's assigned hue. Click the swatch to open the hue picker and change the colour. The new colour is reflected in waveform regions and transcript labels instantly.
Name field Displays the speaker's name. Double-click to rename inline. Press Enter to confirm or Esc to cancel.
ID column The internal identifier assigned to the speaker by the diarization process (e.g., SPEAKER_00). Read-only.
Voice Sample Plays a short audio clip representative of this speaker's voice. Useful for confirming speaker identity during labelling.
Delete button Removes the speaker. Segments previously assigned to the deleted speaker are reassigned to an Unknown placeholder.

Transcript panel

ElementDescription
↓ Export Transcript Opens the export dialog. Choose a file format (PDF, DOCX, TXT, Markdown, or CSV), export style, and optional metadata before exporting.
+ Add CSV Opens a file-picker to load a transcript CSV into the current project. The file must contain start, end, speaker, and text columns.
Speaker label Appears at the top of each speaker block. Displays the speaker's name in their assigned colour. Right-click to reassign all segments in the block to a different speaker.
Segment text Each line of transcript text is a single segment. Click once to seek the audio to that segment's start time. Double-click to edit the text inline.
Segment context menu Right-click any segment to access additional actions: reassign speaker, split the segment at the cursor, merge with the adjacent segment, or delete.

Server access

Server features are unlocked by signing in. See Server Integration for the full workflow.

ElementDescription
User avatar Click to sign in or sign out of your server account. When signed in, your display name or email is shown as a tooltip.
Project list Projects stored in your server account. Appears in the sidebar once signed in. Click a project to open it in the workspace.

Info widget (Ⓘ)

The icon appears next to certain settings and labels throughout the interface. Hovering over it shows a short tooltip explaining the adjacent control. Clicking the icon opens a relevant page in this documentation in a new tab.