Waveform Studio
Waveform Studio is a transcription viewer and editor. Load an audio file, sync it with a diarized transcript, and make precise speaker-labeled edits with frame-accurate waveform navigation.
Cloud app vs desktop app
Waveform Studio is available in two forms: a cloud web app and a downloadable desktop app. Both offer the same core editing experience, but they differ in how they are accessed, how projects are stored, and what features are available.
| Cloud app | Desktop app | |
|---|---|---|
| Access | Browser — no installation required | Downloadable Windows executable |
| Projects | Stored in your cloud account | Stored on your machine |
| Transcription | Automatic, via server-side Whisper and Pyannote | Not yet available — transcripts must be imported as CSV |
| Account required | Yes | No |
| Cost | Paid subscription | Free |
Why a desktop app?
Waveform Studio is open source. The desktop app exists so that anyone can use the full editing environment for free, without a cloud subscription or an internet connection. Rather than running in a browser, the desktop app bundles an internal server that runs locally on your machine.
The desktop app is currently available on Windows only (Windows 10 and Windows 11, 64-bit).
Transcription in the desktop app
Automatic transcription is available in the cloud web app, where it is handled server-side
using Whisper and Pyannote. It is not yet available in the desktop app. To work with a
transcript in the desktop app you will need to generate one using an external tool and
import it as a CSV file with start, end, speaker,
and text columns.
Waveform Studio was designed around Whisper (speech-to-text) and Pyannote (speaker diarization), both of which are free and open source and produce output compatible with the import format. Other transcription tools can also be used as long as their output is converted to the expected CSV structure.
Your first project
Waveform Studio is built around the concept of a project — a folder that contains your audio file, transcript data, and any project settings. Follow these steps to create and load your first project.
start, end,
speaker, and text columns), click + Add CSV
in the transcript panel to load it. Otherwise, use the server's
Transcribe feature to generate one automatically.
Account
An account is required to use the Waveform Studio cloud web app. The desktop app does not require one.
Creating an account
Waveform Studio is currently in closed beta. Account creation is not open to the public — new accounts require administrator approval before they can be used.
Why make an account?
With an account you can:
Subscription tiers
All accounts start on the Free tier. Paid tiers unlock higher usage limits, more storage, and access to larger Whisper models. Billing can be monthly or yearly; yearly billing includes a 20% discount.
| Tier | Price | Transcription / month | Storage | Max audio length |
|---|---|---|---|---|
| Free | $0 | 5 hours | 2 GB | 60 min |
| Starter | $4.99 / mo or $47.90 / yr | 20 hours | 20 GB | 240 min |
| Pro | $19.99 / mo or $191.90 / yr | 50 hours | 50 GB | 480 min |
| Business | $39.99 / mo or $383.90 / yr | 100 hours | 100 GB | Unlimited |
Whisper model availability also varies by tier — larger, more accurate models are only available on higher tiers. See Transcription for model details. Your current tier and usage can be viewed and changed in the Subscription tab of Account Settings.
Account settings
Open Account Settings by clicking the avatar icon in the top bar of the sidebar. Settings are divided into tabs.
Profile
| Setting | Description |
|---|---|
| Display name | The name shown in the app and to other users. Can be changed at any time. |
| The email address associated with your account. Read-only. | |
| Sign-in provider | Shows how you authenticate: Email / Password, Google, or GitHub. Read-only. |
| Member since | The date your account was created. Read-only. |
Look & Feel
| Setting | Description |
|---|---|
| Theme | Choose between Auto (follows your OS or browser preference), Light, and Dark. Custom themes can also be created, edited, imported, and exported as JSON files. Each theme stores its own set of colour overrides. |
| Colours |
Fine-grained colour overrides for the current theme, grouped into four categories:
Accent — Primary and secondary accent colours used for highlights, buttons, and interactive elements. Base — Background, surface layers, borders, body text, and muted text. Waveform — The unplayed and played portions of the waveform. Status — Danger (errors, destructive actions), Success (confirmations), and Recording (active recording indicator). Each colour has a reset button to revert it to the theme default. |
App Behavior
| Setting | Description |
|---|---|
| On startup |
Controls what happens when the app loads.
Open last project (default) — Automatically reopens the project you had open when you last closed the app. Show home screen — Always opens the home screen, regardless of what was open previously. |
Subscription
| Item | Description |
|---|---|
| Current plan | Displays your active tier and billing period (monthly or yearly). |
| Next payment | The date your next billing cycle begins. Not shown for the Free tier. |
| Transcription usage | Hours of transcription used this month versus your monthly allowance. |
| Storage usage | Total storage used by your projects and audio files versus your plan limit. |
| Upgrade plan | Opens the plan selector to upgrade or change your subscription tier. |
| Cancel subscription | Cancels your paid subscription. Only visible on paid tiers. |
Account Management
| Action | Description |
|---|---|
| Delete account | Permanently deletes your account and all associated data. This cannot be undone. A confirmation prompt is shown before the action is carried out. |
| Sign out | Signs you out of the app and returns you to the home screen. |
Explanation of terms
There are a number of different terms used throughout the application, and this guide. This is a list of terms and their meanings.
| Term | Definition |
|---|---|
| App | The entire application, including sidebars and editing tools. |
| Workspace | The viewing and editing suite that you can utilize for a project once open. |
| Project | A collection containing a single transcription with a waveform and speaker definitions. Must be loaded into a workspace to edit. |
| Waveform | A digital representation of an audio signal, typically in the form of a .wav or .mp3 file. They can be visualized by graphing the signal's amplitude over time. |
| Region | A portion of the audio displayed below the waveform. Each region corresponds to a segment. Likewise, the start and end of paragraphs are represented with rounded corners. Speakers are represented by color. |
| Transport | The current time in the waveform which is being played. Represented with a vertical yellow line. |
| Transcription | An approximation of the words spoken within an audio waveform, along with which speaker said them, and when they were said. In practice, a transcription is represented by a series of segments. |
| Speaker | A single voice that has been identified by the transcriber of defined by the user. |
| Diarization | Splitting a transcription into multiple "channels" or speakers. |
| Segment | A small string of text identified during the transcribing process. Has a start time, end time, and speaker. |
| Paragraph | A contiguous string of segments with the same speaker, and no long pauses in between. A paragraph split occurs when the speaker changes, or the current speaker pauses for more than one second. |
| Speaker Block | A contiguous string of paragraphs with the same speaker. Each new speaker block is prefaced with a speaker label. |
The workspace
The workspace is divided into three key areas. Each panel is resizable by dragging the dividers between them.
Sidebar & projects
The sidebar on the left edge of the app lists your projects. When connected to a server, projects are stored in your account on that server. The sidebar can be collapsed with the toggle button (◀) on its right edge.
Header buttons
| Button | Description |
|---|---|
| New Folder | Creates a new folder inside the current location in your server files. Only available when signed in. |
| Open | Opens a file picker to load a local .wfs project archive from disk. |
| + New | Creates a new empty project on the server. |
Top bar buttons
A row of utility buttons runs along the very top of the sidebar.
| Button | Description |
|---|---|
| ⌂ Home | Returns to the start screen, closing the current project. |
| ? Shortcuts | Toggles the keyboard shortcut legend at the bottom of the sidebar. |
| ◑ Theme | Opens the theme picker. Choose between Auto (follows the browser or OS preference), Light, and Dark. Additional themes and custom colour options are available in Settings. |
| ℹ Docs | Opens this documentation in a new tab. |
| ⚙ Settings | Opens the application settings panel. |
My Files and Shared
When signed in, the project list is divided into two sections:
Working with projects
Saving
The ↓ Save button in the title bar downloads the current project as a
.wfs file. This can be used at any time to take a local copy of a project
for backup or for use in the desktop app.
Server projects are saved automatically — a few seconds after each edit, changes are pushed to the server. This is the default and intended behaviour. The ↑ Upload button in the title bar is available as a manual trigger, useful as a confirmation that changes have been saved or when auto-save has been turned off. Auto-save can be disabled in Settings.
Folders
Server projects can be organized into folders. Folders are nested — you can create folders inside other folders — and the current location is shown as a breadcrumb trail at the top of the project list.
Waveform panel
The waveform panel is the primary audio interface. It displays a zoomable waveform of the loaded audio file, lets you control playback, and shows speaker regions aligned to the transcript. Every component is described below.
Layout
The panel is divided into two areas stacked vertically:
Below the controls bar is the minimap — a full-width thumbnail of the entire waveform. The shaded rectangle shows which portion of the audio is currently visible. Drag the rectangle to pan, or drag its handles to resize the view.
Seeking
Click anywhere on the waveform display or on a region in the region lane to move the playhead to that position. The transcript panel will scroll to and highlight the corresponding segment.
Use the ↩ 5s and 5s ↪ buttons in the controls bar to skip backward or forward by five seconds. The keyboard equivalents are ← and →.
Playback controls
Zooming
Zooming in reveals more detail in the waveform and makes precise seeking easier.
Speaker regions
The region lane is the coloured strip directly below the waveform display. It provides a timeline view of every transcript segment laid out to scale against the audio.
Each block in the lane is a region — it maps exactly to one transcript segment. The block's horizontal position and width represent that segment's start and end times, so longer blocks mean longer speech. Regions are colour-coded by speaker: the colour of each block matches the hue assigned to that speaker in the Speakers panel, making it easy to see at a glance who is speaking and when.
Regions are grouped by paragraph. All segments within the same paragraph are joined into a single continuous bar with rounded outer corners and the speaker's name drawn inside it. A small gap separates each paragraph from the next. A speaker block in the transcript may contain multiple paragraphs, so it is possible to see several bars of the same colour in a row — one per paragraph — before the colour changes for a different speaker.
When you zoom in on the waveform, the region lane zooms with it, keeping regions aligned to the audio beneath them. If no transcript has been loaded, the region lane is hidden.
Interacting with regions
Speakers panel
The speakers panel lists every speaker in the project. Each speaker has a name, a colour, an internal ID, and an optional voice sample. Speakers are listed in the order they first appear in the transcript.
| Column | Description |
|---|---|
| Colour swatch | A filled circle showing the speaker's assigned hue. Click it to open the hue picker and change the colour. |
| Name | The speaker's display name, rendered in their colour. Click to rename inline. Hovering highlights the speaker's regions in the waveform panel. |
| ID | The speaker's raw internal identifier as assigned by the diarization process (e.g. SPEAKER_00). This is read-only and cannot be changed. |
| Voice sample | Shows a mini waveform and playback controls if a sample has been recorded or uploaded, or buttons to add one if not. See Voice samples below. |
| Delete (✕) | Removes the speaker. If they have segments assigned, a dialog prompts you to reassign those segments to another speaker first. Disabled when only one speaker exists. |
Adding speakers
Click + Add Speaker at the top of the panel to create a new speaker.
A new row appears with an auto-generated ID (e.g. SPEAKER_04) and the name
field is immediately opened for editing so you can type a name right away.
Renaming a speaker
Click a speaker's name to enter inline edit mode. Type the new name, then press Enter or click away to confirm. Press Esc to cancel and restore the previous name. The new name is reflected everywhere in the transcript immediately.
If you clear the name field and confirm, the name resets to the speaker's raw ID.
Changing a speaker's colour
Click the colour swatch to the left of a speaker's name to open the hue picker. Drag the handle around the colour wheel to choose a new hue. The change is applied live — waveform regions, region lane bars, and transcript speaker labels all update instantly.
Highlighting a speaker
Hovering over a speaker's name highlights all of that speaker's regions in the waveform panel, making it easy to see at a glance how much of the audio they occupy and where their turns fall.
Voice samples
Each speaker row has a voice sample cell. A voice sample is a short audio clip that represents a speaker's voice — useful for confirming identity when labelling speakers. When no sample has been set, three buttons are shown for adding one:
Once a sample is set, the cell shows a mini waveform visualisation drawn in the speaker's colour. Click ▶ to play the sample — click it again to stop early. Click the ✕ button to remove the sample.
Deleting a speaker
Click the ✕ button at the right of a speaker row to delete that speaker. A confirmation dialog appears. If the speaker has segments assigned to them, the dialog will ask you to choose another speaker to reassign those segments to before deletion proceeds. If the speaker has no assigned segments, you can delete them directly.
Transcript panel
The transcript panel displays the full diarized transcript as readable text, grouped by speaker. It is the primary interface for reviewing, navigating, and editing the content of a transcript.
Structure
The transcript is organised into three levels:
Paragraph handles
The coloured bar on the left edge of each paragraph is its handle. Its colour matches the speaker. Interacting with the handle affects the entire paragraph:
Segment interactions
Active segment during playback
While audio is playing, the segment corresponding to the current playhead position is highlighted and the transcript scrolls automatically to keep it in view.
Search and replace
A search bar appears at the top of the transcript panel once a transcript is loaded. Type to search — matching segments are highlighted and a match counter shows how many results were found. Use the ◀ and ▶ arrows to step through matches, or press Enter to advance to the next one. Press Esc or click the clear button to dismiss the search.
Use the speaker filter dropdown to restrict results to a specific speaker.
Click the replace toggle button to expand a replace bar beneath the search field. Enter a replacement term and use Replace to substitute the current match, or Replace All to substitute every match at once. Press Tab to move between the search and replace fields.
Text selection
You can select text freely within the transcript using click-and-drag. Selections can span multiple segments. Hold Ctrl while dragging to snap the selection boundaries to whole words.
Once text is selected, two actions are available:
Hyperlinks
Any span of text in the transcript can have a hyperlink attached to it. Linked text is displayed with a distinct style so it stands out from the surrounding content.
Adding and editing links
Select the text you want to link, then press Shift+K or right-click and choose Add link. If no text is selected, the link covers the entire segment. To edit an existing link, select or click on the linked text and use the same shortcut — the dialog opens pre-filled with the existing values.
The link dialog contains four fields:
| Field | Description |
|---|---|
| URL (required) | The destination address. As you type or paste a URL the dialog checks it via the server — a spinner appears while checking, then a ✓ if the URL is reachable or a ✗ if it is not. If the page has a title it will be offered as a suggested display name. |
| Display name (optional) | A short human-readable label for the link, shown in the tooltip when hovering over it. |
| Description (optional) | A longer description of the linked resource, also shown in the tooltip. |
| Editor notes (optional) | Private notes visible only in edit mode. Not shown when the transcript is viewed in read-only or presentation mode. |
Behaviour with existing links
When a new link is added over a range that already contains one or more links, conflicts are resolved automatically:
| Situation | What happens |
|---|---|
| New link fully contains an existing link | The existing link is removed and the new link covers the entire range. |
| New link is fully inside an existing link | The existing link is split into two — one covering the text before the new link, one covering the text after. The new link occupies the middle. Both split portions retain the original URL and display name. |
| New link partially overlaps an existing link | The existing link is trimmed so its range ends where the new link begins (or begins where the new link ends). Neither link is deleted — they are made adjacent. |
| Selection spans multiple segments | A confirmation prompt is shown: "This selection spans multiple segments. They will be automatically merged before the link is added." Confirming merges the segments into one, then adds the link. This cannot be undone separately — the merge is permanent. |
| Editing text in a segment that has a link | The link adjusts based on where the edit is made. An edit before the linked text shifts the link's position to follow it. An edit after the linked text leaves the link unchanged. A whole-segment link (one covering the entire segment) expands to cover the new full text after the edit. If the edit directly overlaps the linked text itself, the link is removed. |
| Leading or trailing whitespace in selection | Whitespace at the edges of a selection is automatically trimmed before the link is stored, so the visible highlight never starts or ends on a space. |
Following links
Hovering over linked text shows a tooltip with the link's display name, description, URL, and any editor notes (in edit mode). The tooltip appears after a short delay, or immediately if Ctrl is already held.
To follow a link, hold Ctrl and click the linked text. The URL opens in a new tab.
Toolbar buttons
| Button | Description |
|---|---|
| ↓ Export | Opens the export panel. Choose a format (PDF, DOCX, TXT, Markdown, or CSV), configure options, and download the file. |
| + Add CSV | Loads a transcript from a CSV file. The file must have start, end, speaker, and text columns. If a transcript is already loaded, a confirmation prompt appears first. |
| Delete transcript | Permanently removes the transcript from the project. A confirmation prompt is shown before deletion. |
| Transcribe | Available when audio is loaded. Opens the transcription options dialog where you can choose a Whisper model and speaker count, then submits the job. A progress bar tracks the job in real time; the transcript loads automatically on completion. |
Transcription
Waveform Studio can automatically transcribe an audio file and generate a speaker-diarized transcript using Whisper (speech-to-text) and Pyannote (speaker diarization). Transcription runs on cloud GPU infrastructure and is available in the web app only — see Cloud vs Desktop for details.
Starting a transcription job
Transcription options
| Option | Description |
|---|---|
| Whisper model |
Controls the speech-to-text accuracy and processing speed. Larger models produce more
accurate transcripts but take longer and cost more.
Available models, from fastest to slowest: Tiny, Base, Small, Turbo, Medium (recommended), Large. |
| Diarization model | The Pyannote model used to identify and separate speakers. Two variants are available: Speaker Diarization 3.1 and Speaker Diarization Community 1. |
| Est. speakers | An optional hint for how many speakers are in the audio. Set to 0 to let the diarization model detect the number automatically. Providing the correct count can improve accuracy when the number of speakers is known in advance. |
| Voice samples | If any speakers in the project have voice samples attached, this checkbox becomes available. When enabled, the samples are passed to the diarization model to help it identify speakers. See Speakers Panel for how to record or upload voice samples. |
Cost and time estimates
The dialog displays an estimated cost and processing time based on your audio duration and chosen models. These are calculated assuming an A10G GPU and will update live as you change the model selections. Actual cost and time may vary.
Running transcription yourself
The transcription pipeline code is included in the project source. If you are self-hosting Waveform Studio or want to run transcription jobs independently, you can adapt or invoke this code directly rather than using the hosted cloud service.
See the developer documentation for a full walkthrough: Transcription Pipeline Tutorial.
Presentation mode
Presentation mode is a read-only, shareable view of a project designed for playback and transcript review. It strips away all editing controls and presents the audio player and transcript in a clean, scrollable layout that anyone can use — no account required, depending on your sharing settings.
Presentation mode is available for projects in the web app only.
Opening presentation mode
Click the Present button in the workspace header. This opens the presentation view in a new browser tab at a unique URL for your project. You can copy and share this URL with others using the Copy link button inside the presentation view.
How it differs from the editor
| Editor | Presentation mode |
|---|---|
| Full editing tools — segments, speakers, waveform | Read-only — no editing of any kind |
| Requires an account | Can be accessed publicly or with authentication (see Access control below) |
| Waveform, Speakers, and Transcript panels | Audio player with region lane, and scrollable transcript |
| Right-click context menus on segments | No context menus — segments are click-to-seek only |
| Hyperlinks require Ctrl+click to follow | Hyperlinks open on a single click |
Layout
The presentation view has three main areas:
Playback controls
| Control | Description |
|---|---|
| Play / Pause | Starts or pauses audio playback. |
| « / » | Jump to the previous or next paragraph boundary. |
| ‹ / › | Jump to the previous or next segment boundary. If playback is more than 0.5 seconds into the current segment, pressing ‹ restarts the current segment rather than moving to the previous one. |
| Volume | Drag the volume slider to adjust level. Click the speaker icon to mute or unmute. |
| Speed | Choose a playback rate: 0.5×, 0.75×, 1×, 1.25×, 1.5×, or 2×. |
Transcript interaction
The transcript is interactive even in read-only mode:
Access control
Whether viewers need to sign in depends on the project's sharing setting:
Live Quotes
A Live Quote is an interactive audio-and-text widget that you can embed on any webpage. It shows a highlighted excerpt from your transcript alongside a playable audio clip — viewers can read the words, click any segment to seek, and play or pause the clip directly in the page, with no account or sign-in required.
Live Quotes are available in the web app only. The source project must be saved to the server.
Opening the dialog
Select any text in the transcript panel, then open the Live Quote dialog using one of two methods:
The dialog has two panels: a configuration panel on the left and a live preview of the final widget on the right.
Choosing which segments to include
The left panel lists every transcript segment that falls within your selection. Each segment has a checkbox. Unchecking a segment removes its text from the displayed quote and replaces it with an ellipsis (…) — but the underlying audio still plays through uninterrupted. This lets you condense a quote without creating jarring audio gaps.
Trimming the boundaries
When the first or last segment in your selection is only partially selected, a trim checkbox appears at that boundary:
| Option | Effect |
|---|---|
| Trim start | Prefixes the displayed text with … to signal that the quote begins mid-sentence. The audio clip still starts at the beginning of the first segment. |
| Trim end | Appends … to the displayed text to signal that the quote ends mid-sentence. The audio clip still plays to the end of the last segment. |
Embed options
| Option | Description |
|---|---|
| Speaker attribution | Enable to show a speaker name below the quote text, prefixed with an em dash (— Name). Pre-filled with the first speaker's name; editable before generating. |
| Quote marks | Wraps the transcript text in typographic opening and closing quotation marks. On by default. |
| Waveform | Shows a waveform visualization above the playback controls. Displays the audio amplitude for the clip; also clickable for seeking. On by default. |
| Download button | Adds a download icon to the widget footer so viewers can save the audio clip. Off by default. |
| Font | The typeface used for the quote text. Choices: Inter, IBM Plex Sans, IBM Plex Mono, Georgia. |
| Audio format | The format of the generated audio clip: mp3 (broader compatibility) or webm (smaller file size). |
Duration limit
Live Quote clips are capped at 60 seconds. A duration bar below the segment list shows the current clip length. As you approach the limit the bar turns amber; if you exceed it the bar turns red and the Create Embed Code button is disabled until you reduce the selection.
Live preview
The right panel shows an exact preview of the widget as it will appear on the page. It updates in real time as you check or uncheck segments and toggle options. You can play the audio clip directly in the preview — clicking any segment seeks to that point, and clicking the waveform or progress bar seeks to that position in the clip.
Generating the embed code
Click Create Embed Code to submit the configuration to the server. The server
processes the audio clip and stores the widget data, then returns an <iframe>
snippet. The right panel switches to show the snippet in a read-only text area with a
Copy to clipboard button.
Paste the snippet into any webpage — a blog post, CMS, or plain HTML file — and the widget will render at 660 × 280 px, scaling down to fit narrower containers automatically.
What viewers see
The embedded widget is fully self-contained and interactive. It includes:
Server integration
When signed in to a Waveform Studio server, your projects are stored in your account and synced automatically. You also gain access to server-side transcription powered by Whisper and Pyannote.
Keyboard shortcuts
Waveform Studio is designed for keyboard-driven workflows. The full shortcut reference is also accessible via the legend at the bottom of the sidebar.
| Shortcut | Action |
|---|---|
| Playback | |
Space |
Play / Pause |
←/→ |
Seek backward / forward 5 seconds |
| Waveform | |
Ctrl+scroll |
Zoom waveform in / out |
+/− |
Zoom in / out |
0 |
Reset zoom to 1× |
| Segment editing (requires a selected segment) | |
E |
Edit segment text inline |
S |
Split segment at chosen word |
C |
Change speaker (opens context menu) |
ShiftA |
Merge with previous segment (same speaker) |
ShiftD |
Merge with next segment (same speaker) |
Tab |
Select next segment (and enter edit mode) |
ShiftTab |
Select previous segment (and enter edit mode) |
ShiftK |
Add or edit a hyperlink on the selected text within a segment |
ShiftF |
Populate the search bar with the currently selected text |
ShiftQ |
Generate a Live Quote embed from the selected text |
| General | |
CtrlZ |
Undo |
CtrlY |
Redo |
Esc |
Cancel edit / close popup / clear selection |
UI element reference
A complete reference for every interactive element in Waveform Studio, organized by area of the interface.
Sidebar
| Element | Description |
|---|---|
| ◀ Toggle button | Collapses or expands the sidebar. When collapsed, more horizontal space is given to the workspace. |
| User avatar | Displays the current account icon. Clicking it opens account options (sign in / sign out when a server is connected). |
| ⚙ Settings button | Opens the application settings panel. |
| ⊙ Open | Opens a packaged project from disk. Accepts a .wfs project archive file. |
| + New | Creates a new, empty project on the server and adds it to the project list. |
| New Folder | Creates a new folder in the current location. Folders can be used to organize projects. Click a folder to navigate into it; use the breadcrumb trail at the top of the list to navigate back up. |
| Project list | Lists your projects, grouped by folder when folders are in use. Click a project to load it into the workspace. Right-click a project for options: Rename, Duplicate, Delete, and Move to folder. |
| Ⓘ Open Docs | Opens this documentation in a new browser tab. |
| Theme buttons (Auto / ☀ / ☾) | Switch between automatic (follows browser/OS preference), light, and dark themes. The selection is persisted across sessions. |
| Shortcut legend | A compact keyboard shortcut reference at the bottom of the sidebar. See Keyboard Shortcuts for the full list. |
Title bar
| Element | Description |
|---|---|
| Project title | Displays the current project's name. Click to enter rename mode; press Enter to confirm or Esc to cancel. |
| Server status indicator | Shows whether the open project is synced with the server. Displays a coloured pill: green for synced, amber for unsaved remote changes, grey for local-only. |
| ↓ Save (download icon) | Downloads the current project as a .wfs file. |
| ↑ Upload (upload icon) | Pushes the current project to the connected server. Only available when a server connection is active. |
Waveform panel controls
| Element | Description |
|---|---|
| + Add Audio | Opens a file-picker to load an audio file into the current project. Supported formats: MP3, WAV, FLAC, OGG, M4A, AAC, WebM, OPUS. |
| Drop zone | Drag an audio file from your desktop directly onto this area to load it. Disappears after an audio file is loaded. |
| Track name / metadata row | Displays the filename and sample rate of the loaded audio. The coloured dot and status text reflect the current playback state (PLAYING / PAUSED / STOPPED). |
| Waveform display | A zoomable waveform of the loaded audio. The yellow vertical line is the transport. Click anywhere to seek. Scroll with Ctrl + wheel to zoom. |
| Region lane | The coloured bar directly below the waveform. Each coloured block is a region corresponding to a speaker segment. Regions are colour-coded by speaker. |
| Time ruler | Shows timestamps along the bottom of the waveform area. Tick density adjusts automatically with the zoom level. |
| Minimap | A full-width thumbnail of the entire waveform. The shaded thumb shows the current viewport. Drag the thumb or its handles to pan and resize the view. |
| ▶ Play / Pause | Toggles audio playback. Keyboard shortcut: Space. |
| Timecode display | Shows the current playhead position and total duration in m:ss.d format. |
| ↩ 5s / 5s ↪ | Skip backward or forward by five seconds. Keyboard shortcuts: ← / →. |
| ⊙ FOLLOW | When active, the waveform view automatically scrolls to keep the playhead visible during playback. |
| ⊡ RESET | Resets the waveform zoom to 1× and re-centers the view. |
| ⌖ CURSOR toggle | Toggles the zoom anchor between the cursor position and the playhead. When active (highlighted), zooming recenters on the mouse cursor; when inactive, it recenters on the playhead. |
| Zoom level indicator | Displays the current zoom multiplier (e.g., 2×). |
| VOL slider | Controls playback volume from 0 to 100%. |
| Speed selector | Sets the playback rate. Available options: 0.5×, 0.75×, 1.0×, 1.25×, 1.5×, 2.0×. |
| Status bar | A two-part strip below the controls. The left side shows the current operation state (e.g., READY, LOADING). The right side shows context-sensitive information such as the hovered timestamp or active segment. |
Speakers panel
| Element | Description |
|---|---|
| + Add Speaker | Creates a new speaker entry with a default name and an auto-assigned colour. The speaker can be renamed immediately after creation. |
| Color swatch | Shows the speaker's assigned hue. Click the swatch to open the hue picker and change the colour. The new colour is reflected in waveform regions and transcript labels instantly. |
| Name field | Displays the speaker's name. Double-click to rename inline. Press Enter to confirm or Esc to cancel. |
| ID column | The internal identifier assigned to the speaker by the diarization process (e.g., SPEAKER_00). Read-only. |
| Voice Sample | Plays a short audio clip representative of this speaker's voice. Useful for confirming speaker identity during labelling. |
| Delete button | Removes the speaker. Segments previously assigned to the deleted speaker are reassigned to an Unknown placeholder. |
Transcript panel
| Element | Description |
|---|---|
| ↓ Export Transcript | Opens the export dialog. Choose a file format (PDF, DOCX, TXT, Markdown, or CSV), export style, and optional metadata before exporting. |
| + Add CSV | Opens a file-picker to load a transcript CSV into the current project. The file must contain start, end, speaker, and text columns. |
| Speaker label | Appears at the top of each speaker block. Displays the speaker's name in their assigned colour. Right-click to reassign all segments in the block to a different speaker. |
| Segment text | Each line of transcript text is a single segment. Click once to seek the audio to that segment's start time. Double-click to edit the text inline. |
| Segment context menu | Right-click any segment to access additional actions: reassign speaker, split the segment at the cursor, merge with the adjacent segment, or delete. |
Server access
Server features are unlocked by signing in. See Server Integration for the full workflow.
| Element | Description |
|---|---|
| User avatar | Click to sign in or sign out of your server account. When signed in, your display name or email is shown as a tooltip. |
| Project list | Projects stored in your server account. Appears in the sidebar once signed in. Click a project to open it in the workspace. |
Info widget (Ⓘ)
The Ⓘ icon appears next to certain settings and labels throughout the interface. Hovering over it shows a short tooltip explaining the adjacent control. Clicking the icon opens a relevant page in this documentation in a new tab.