Unit Testing Recommendations

Frontend

Functions across static/js that should have unit tests, organized by file.

`utilities/tools.js`

Function	Why test
`generateId()`	Pure function; uniqueness/collision avoidance is critical
`formatTime(s)`	Pure time formatter; edge cases (0, large values, non-integers)
`formatTimeMs(s)`	Same as above with decimal
`hexToRgb(hex)`	Pure color parser; input validation and parsing correctness
`hslToHex(h, s, l)`	Color space conversion; correctness depends on math
`hexToHue(hex)`	Inverse of hslToHex; round-trip consistency
`colorToHSV(color)`	Multi-format color parsing; multiple code paths to test
`parseCSV(text)`	Critical parsing logic; RFC 4180 quoting, escaped quotes, embedded newlines
`farthestColor(colors)`	Delegates to farthestAngle; integration-level test
`farthestAngle(angles)`	Circular geometry; edge cases (empty, one, wrap-around at 360)

`utilities/audio.js`

Function	Why test
`encodeMonoWav(samples, sampleRate)`	Binary format encoding; WAV header correctness is precise
`loadAudioFile(file)`	Returns structured data; testable with mock AudioContext
`decodeSampleArrayBuffer(arrayBuffer)`	Structured data transform; testable with mock AudioContext
`extractPeaks(channelBuffers, peakCount)`	Downsampling logic; results depend on math

`project.js` — `Transcript` class

Function	Why test
`#buildParagraphs(segments)`	Core grouping logic; gap threshold behavior has multiple code paths
`#buildSpeakerBlocks(paragraphs)`	Groups consecutive same-speaker paragraphs
`compileCSV()`	Serialization; must match `parseCSV()` (round-trip test)
`splitSegment(idx, a, b)`	Data mutation; result must preserve all data correctly
`mergeSegments(idxA, idxB)`	Data mutation; text concatenation and index bookkeeping
`changeSpeaker(segIdx, newId)`	Reassigns data field, triggers rebuild
`segmentAtTime(time)`	Binary search; edge cases (before start, after end, in gap)
`clone(deep)`	Deep vs. shallow copy correctness

`project.js` — `Project` class

Function	Why test
`loadTranscriptCSV(text, local)`	Parses CSV and auto-creates speakers; complex orchestration
`addSpeaker(...)`	Auto-generates hue using `farthestAngle`; color uniqueness
`reassignSegments(fromId, toId)`	Bulk mutation; all segments must be reassigned
`metadata()`	Serialization for server upload; field correctness
`isDirty()`	Returns true only when expected dirty flags are set
`markAllDirty(dirty)` / `markTranscriptDirty()` etc.	Flag state transitions
`packageProject(filepath)`	ZIP content validation (what's included, filenames)
`unpackageProject(zipFile)`	Inverse of packageProject; round-trip integrity

`workspace_panels/waveform_panel.js`

Function	Why test
`clientXToTime(clientX)`	Pixel-to-time coordinate conversion; math-heavy
`basePxPerSec()`	Scaling factor calculation; affects all coordinate math
`regionIndexAtX(clientX)`	Viewport coordinate to segment index
`regionIndexAtTime(time)`	Time to segment index
`skipN(n)`	Seek clamping to [0, duration]
`zoomIn(amount)` / `zoomOut(amount)` / `applyZoom(level)`	Zoom bounds clamping to [MIN_ZOOM, MAX_ZOOM]

`workspace_panels/speakers_panel.js`

Function	Why test
`addSpeaker(id)`	Auto-generates `SPEAKER_N` IDs with collision avoidance
`speakerDisplayName(id)`	Name fallback to ID; simple but branching
`sliceSegmentAsSample(id, seg)`	Audio extraction and sample rate handling

`server.js`

Function	Why test
`audioUrl(id)`	URL construction; must match server expectations
`transcriptUrl(id)`	URL construction; must match server expectations
`sampleUrl(id, spkId)`	URL construction; must match server expectations

Priority

Highest priority — pure logic, no DOM, high business impact:

parseCSV / compileCSV round-trip
Transcript grouping (#buildParagraphs, #buildSpeakerBlocks)
farthestAngle (used for speaker color assignment globally)
segmentAtTime (used by waveform sync)
encodeMonoWav (WAV binary format correctness)
formatTime / formatTimeMs

Medium priority — data integrity, some setup required:

addSpeaker with hue auto-generation
mergeSegments / splitSegment
packageProject / unpackageProject round-trip
clientXToTime and coordinate conversion functions
colorToHSV, hslToHex, hexToHue color pipeline

Lower priority — require mocking AudioContext or Web Workers:

encodeMonoWav, extractPeaks, loadAudioFile, decodeSampleArrayBuffer
sliceSegmentAsSample

Backend

Functions across app.py and application/ that should have unit tests, organized by file.

`application/auth.py`

Function	Why test
`check_password(password)`	Security-critical; timing-safe comparison must reject wrong passwords without leaking timing info
`handle_login()`	Multiple paths: correct password generates token, wrong password rejects; token is added to valid set
`login_required(f)`	Decorator gate; token from header vs query param, valid vs invalid token — all paths must be covered
`handle_logout()`	Token must be removed from valid set; missing token must not error

`application/files.py`

Function	Why test
`project_dir(project_id)`	Pure path construction; verify correct path for various project ID formats
`read_project_json(project_id)`	Raises `FileNotFoundError` if missing; verify error handling path vs success path
`write_project_json(project_id, data)`	JSON serialization; verify indent, file creation, content correctness
`read_speakers_json(project_id)`	Returns `{}` if missing (not a raise); verify graceful fallback distinct from `read_project_json`
`read_waveform_json(project_id)`	Same fallback pattern as `read_speakers_json`; verify independently
`write_speakers_json(project_id, data)`	Data persistence; verify serialization and logging
`write_waveform_json(project_id, data)`	Uses no indent unlike other writers; verify format difference
`save_sample(project_id, speaker_id, file_storage)`	Sanitizes `speaker_id` (strips non-alphanumeric except `_`), raises `ValueError` on empty result
`list_project_dirs()`	Filters by `is_dir()` AND `project.json` exists; verify both conditions and empty-directory case
`stream_file(path)`	High priority. Complex Range header parsing, partial vs full content (206 vs 200), MIME type mapping; has a known bug (BUG_REPORT #20)

`application/projects.py`

Function	Why test
`_now()`	Pure function; verify ISO 8601 UTC format
`_project_exists(project_id)`	Pure path existence check
`list_projects()`	Exception handling (bad project skipped, not fatal), sorting by modified date, data transformation
`get_project(project_id)`	Enriches data with `has_transcript` flag; verify existence check and flag logic
`create_project(metadata, audio_file, waveform, transcript_file, sample_files)`	High priority. UUID generation, directory setup, metadata normalization, conditional file saving for all optional params
`update_project(project_id, metadata, speakers, transcript_file, sample_files)`	High priority. Three distinct conditional branches: metadata update, speakers update (with/without full dict), transcript/samples
`_normalise_speakers(speakers, project_id, sample_files, existing)`	High priority. `has_sample` precedence: new uploads > client value > existing stored > false; verify all combinations
`duplicate_project(project_id)`	New UUID, copied files, updated name and timestamps; verify metadata is fully correct
`get_speakers(project_id)` / `get_waveform(project_id)`	Existence check + file read; verify both

`application/transcription/transcribe.py`

Function	Why test
`load_audio_dict(filepath)`	Mono vs stereo branch (unsqueeze vs transpose), conditional resampling to 16kHz; shape handling is critical
`assign_speaker(segment_start, segment_end, annotation)`	Overlap calculation; edge cases: no overlap, multiple speakers, boundary-exact matches

Priority

Highest priority — security-critical or complex branching logic:

stream_file — known bug, complex Range header parsing
create_project — core creation, many optional parameters
update_project — three distinct conditional update paths
_normalise_speakers — precedence rules for has_sample
check_password / handle_login — authentication gate

Medium priority — data integrity and error handling:

login_required decorator
list_projects — exception handling and sorting
save_sample — speaker ID sanitization
duplicate_project — UUID and metadata correctness
read_project_json / read_speakers_json — different error handling patterns
load_audio_dict / assign_speaker — audio processing logic

Lower priority — simple but worth covering:

_now, _project_exists, project_dir, list_project_dirs
get_project, get_speakers, get_waveform
handle_logout
write_* persistence functions

Integration

Flask route integration tests. Each test exercises auth → business logic → file I/O as a full stack, using a test client against a temp PROJECTS_DIR.

Auth routes

`POST /auth/login`

Scenario	Expected
Correct password	200 + `{token}` in response; token accepted by authenticated routes
Wrong password	401 `{error}`
Missing `password` field	401
Non-JSON body	401
Two sequential logins	Each returns a unique token; both tokens valid
Login → logout → login	New token works; old token rejected

`POST /auth/logout`

Scenario	Expected
Logout with valid token	200; token rejected on next authenticated request
Logout without token	200 (no error)
Logout twice	Both return 200; second is a no-op

Project CRUD

`GET /api/projects`

Scenario	Expected
No token	401
No projects on disk	200 `[]`
One project	200 array with that project
Multiple projects	Sorted by `modified` descending
Project created with transcript	`has_transcript: true`
Project created without transcript	`has_transcript: false`
Corrupted `project.json`	That project skipped; others returned normally

`POST /api/projects`

Scenario	Expected
No token	401
Audio only	201; directory created with `audio.wav`, `project.json`, `speakers.json`
With metadata `{"name":"X"}`	Project name is `"X"`
With waveform JSON	`waveform.json` written with `sampleRate`, `duration`, `peaks`
With transcript file	`transcript.csv` saved; `has_transcript: true`
With speaker samples	`samples/<id>.wav` saved; `has_sample: true` in `speakers.json`
Missing `audio` field	400
Invalid JSON in `metadata`	500
Speaker ID with special chars in samples	Sanitized to alphanumeric+`_`; `ValueError` on empty result → 500
File exceeds `MAX_AUDIO_SIZE_MB`	413 with custom error message

`GET /api/projects/<id>`

Scenario	Expected
No token	401
Existing project	200 with all fields
Non-existent ID	404
After adding transcript via PUT	`has_transcript: true`

`PUT /api/projects/<id>`

Scenario	Expected
No token	401
Non-existent project	404
Update name only	200; `GET /<id>` reflects new name; `modified` timestamp updated
Update speakers	`speakers.json` updated
Add transcript	`transcript.csv` saved; `has_transcript: true`
Add sample for existing speaker	`has_sample` flag updated in `speakers.json`
`metadata: null`	No name change; only `modified` updated
Partial update (speakers only)	Only `speakers.json` touched; `audio.wav` unchanged

`DELETE /api/projects/<id>`

Scenario	Expected
No token	401
Existing project	200 `{ok: true}`; `GET /<id>` returns 404; directory removed from disk
Non-existent ID	404
Delete twice	First 200; second 404
List after delete	`GET /api/projects` no longer includes deleted project

`POST /api/projects/<id>/duplicate`

Scenario	Expected
No token	401
Non-existent source	404
Valid project	201; new `id` differs from source; name has `" (copy)"` suffix
Timestamps	`created` and `modified` reset to duplication time
File completeness	Duplicate directory contains `audio.wav`, `transcript.csv`, `samples/` matching source
Independence	Updating original name does not affect duplicate
Both in list	`GET /api/projects` returns both

File serving

`GET /api/projects/<id>/audio`

Scenario	Expected
No token	401
No audio file	404
Full download	200; `Content-Type: audio/wav`; `Content-Length` set; body matches uploaded file
Range request (interior)	206; `Content-Range` header; body is correct byte slice
Range request (open end)	206; returns from offset to EOF
Range beyond file size	206; end clamped to `file_size - 1`
Malformed Range header	416
`Accept-Ranges` header	Present on all responses

`GET /api/projects/<id>/transcript`

Scenario	Expected
No token	401
No transcript	404
Full download	200; `Content-Type: text/csv`; body matches uploaded CSV
Range request	206; partial CSV bytes returned

`GET /api/projects/<id>/samples/<speaker_id>`

Scenario	Expected
No token	401
Valid sample	200; `Content-Type: audio/wav`; body matches uploaded WAV
Non-existent sample	404
Speaker ID with special chars	Sanitized; resolves to sanitized filename
Path traversal attempt (`../../../etc/passwd`)	Sanitized to empty string → 404
Underscore preserved	`speaker_1` → `samples/speaker_1.wav`
Range request	206; partial WAV bytes returned

`GET /api/projects/<id>/speakers`

Scenario	Expected
No token	401
Non-existent project	404
No speakers created	200 `{}`
With speakers, no samples	`has_sample: false` for each
With speakers and samples	`has_sample: true` for sampled speakers
After PUT update	Reflects updated speaker data

`GET /api/projects/<id>/waveform`

Scenario	Expected
No token	401
Non-existent project	404
No waveform uploaded	200 `{}`
With waveform	200; has `sampleRate`, `duration`, `peaks`

Cross-cutting

`@login_required` decorator

Scenario	Expected
Token in `X-Auth-Token` header	Accepted
Token in `?token` query param	Accepted
No token anywhere	401 `{error: "Authentication required"}`
Invalid token	401
Token invalidated by logout	401 on next use

Request size limit

Scenario	Expected
Upload within `MAX_AUDIO_SIZE_MB`	201
Upload exceeding limit	413 `{error: "Request too large…"}`

Error response format

Status	Body shape
400	`{error: <message>}`
401	`{error: <message>}`
404	`{error: <message>}`
413	`{error: "Request too large — audio file exceeds the configured limit"}`
500	`{error: "Internal server error"}` (real error in logs only)

Unit Testing Recommendations

Frontend

utilities/tools.js

utilities/audio.js

project.js — Transcript class

project.js — Project class

workspace_panels/waveform_panel.js

workspace_panels/speakers_panel.js

server.js

Priority

Backend

application/auth.py

application/files.py

application/projects.py

application/transcription/transcribe.py

Priority

Integration

Auth routes

POST /auth/login

POST /auth/logout

Project CRUD

GET /api/projects

POST /api/projects

GET /api/projects/<id>

PUT /api/projects/<id>

DELETE /api/projects/<id>

POST /api/projects/<id>/duplicate

File serving

GET /api/projects/<id>/audio

GET /api/projects/<id>/transcript

GET /api/projects/<id>/samples/<speaker_id>

GET /api/projects/<id>/speakers

GET /api/projects/<id>/waveform

Cross-cutting

@login_required decorator

Request size limit

Error response format

`utilities/tools.js`

`utilities/audio.js`

`project.js` — `Transcript` class

`project.js` — `Project` class

`workspace_panels/waveform_panel.js`

`workspace_panels/speakers_panel.js`

`server.js`

`application/auth.py`

`application/files.py`

`application/projects.py`

`application/transcription/transcribe.py`

`POST /auth/login`

`POST /auth/logout`

`GET /api/projects`

`POST /api/projects`

`GET /api/projects/<id>`

`PUT /api/projects/<id>`

`DELETE /api/projects/<id>`

`POST /api/projects/<id>/duplicate`

`GET /api/projects/<id>/audio`

`GET /api/projects/<id>/transcript`

`GET /api/projects/<id>/samples/<speaker_id>`

`GET /api/projects/<id>/speakers`

`GET /api/projects/<id>/waveform`

`@login_required` decorator