Unit Testing Recommendations
Frontend
Functions across static/js that should have unit tests, organized by file.
utilities/tools.js
| Function | Why test |
|---|
generateId() | Pure function; uniqueness/collision avoidance is critical |
formatTime(s) | Pure time formatter; edge cases (0, large values, non-integers) |
formatTimeMs(s) | Same as above with decimal |
hexToRgb(hex) | Pure color parser; input validation and parsing correctness |
hslToHex(h, s, l) | Color space conversion; correctness depends on math |
hexToHue(hex) | Inverse of hslToHex; round-trip consistency |
colorToHSV(color) | Multi-format color parsing; multiple code paths to test |
parseCSV(text) | Critical parsing logic; RFC 4180 quoting, escaped quotes, embedded newlines |
farthestColor(colors) | Delegates to farthestAngle; integration-level test |
farthestAngle(angles) | Circular geometry; edge cases (empty, one, wrap-around at 360) |
utilities/audio.js
| Function | Why test |
|---|
encodeMonoWav(samples, sampleRate) | Binary format encoding; WAV header correctness is precise |
loadAudioFile(file) | Returns structured data; testable with mock AudioContext |
decodeSampleArrayBuffer(arrayBuffer) | Structured data transform; testable with mock AudioContext |
extractPeaks(channelBuffers, peakCount) | Downsampling logic; results depend on math |
project.js — Transcript class
| Function | Why test |
|---|
#buildParagraphs(segments) | Core grouping logic; gap threshold behavior has multiple code paths |
#buildSpeakerBlocks(paragraphs) | Groups consecutive same-speaker paragraphs |
compileCSV() | Serialization; must match parseCSV() (round-trip test) |
splitSegment(idx, a, b) | Data mutation; result must preserve all data correctly |
mergeSegments(idxA, idxB) | Data mutation; text concatenation and index bookkeeping |
changeSpeaker(segIdx, newId) | Reassigns data field, triggers rebuild |
segmentAtTime(time) | Binary search; edge cases (before start, after end, in gap) |
clone(deep) | Deep vs. shallow copy correctness |
project.js — Project class
| Function | Why test |
|---|
loadTranscriptCSV(text, local) | Parses CSV and auto-creates speakers; complex orchestration |
addSpeaker(...) | Auto-generates hue using farthestAngle; color uniqueness |
reassignSegments(fromId, toId) | Bulk mutation; all segments must be reassigned |
metadata() | Serialization for server upload; field correctness |
isDirty() | Returns true only when expected dirty flags are set |
markAllDirty(dirty) / markTranscriptDirty() etc. | Flag state transitions |
packageProject(filepath) | ZIP content validation (what's included, filenames) |
unpackageProject(zipFile) | Inverse of packageProject; round-trip integrity |
workspace_panels/waveform_panel.js
| Function | Why test |
|---|
clientXToTime(clientX) | Pixel-to-time coordinate conversion; math-heavy |
basePxPerSec() | Scaling factor calculation; affects all coordinate math |
regionIndexAtX(clientX) | Viewport coordinate to segment index |
regionIndexAtTime(time) | Time to segment index |
skipN(n) | Seek clamping to [0, duration] |
zoomIn(amount) / zoomOut(amount) / applyZoom(level) | Zoom bounds clamping to [MIN_ZOOM, MAX_ZOOM] |
workspace_panels/speakers_panel.js
| Function | Why test |
|---|
addSpeaker(id) | Auto-generates SPEAKER_N IDs with collision avoidance |
speakerDisplayName(id) | Name fallback to ID; simple but branching |
sliceSegmentAsSample(id, seg) | Audio extraction and sample rate handling |
server.js
| Function | Why test |
|---|
audioUrl(id) | URL construction; must match server expectations |
transcriptUrl(id) | URL construction; must match server expectations |
sampleUrl(id, spkId) | URL construction; must match server expectations |
Priority
Highest priority — pure logic, no DOM, high business impact:
parseCSV / compileCSV round-tripTranscript grouping (#buildParagraphs, #buildSpeakerBlocks)farthestAngle (used for speaker color assignment globally)segmentAtTime (used by waveform sync)encodeMonoWav (WAV binary format correctness)formatTime / formatTimeMs
Medium priority — data integrity, some setup required:
addSpeaker with hue auto-generationmergeSegments / splitSegmentpackageProject / unpackageProject round-tripclientXToTime and coordinate conversion functionscolorToHSV, hslToHex, hexToHue color pipeline
Lower priority — require mocking AudioContext or Web Workers:
encodeMonoWav, extractPeaks, loadAudioFile, decodeSampleArrayBuffersliceSegmentAsSample
Backend
Functions across app.py and application/ that should have unit tests, organized by file.
application/auth.py
| Function | Why test |
|---|
check_password(password) | Security-critical; timing-safe comparison must reject wrong passwords without leaking timing info |
handle_login() | Multiple paths: correct password generates token, wrong password rejects; token is added to valid set |
login_required(f) | Decorator gate; token from header vs query param, valid vs invalid token — all paths must be covered |
handle_logout() | Token must be removed from valid set; missing token must not error |
application/files.py
| Function | Why test |
|---|
project_dir(project_id) | Pure path construction; verify correct path for various project ID formats |
read_project_json(project_id) | Raises FileNotFoundError if missing; verify error handling path vs success path |
write_project_json(project_id, data) | JSON serialization; verify indent, file creation, content correctness |
read_speakers_json(project_id) | Returns {} if missing (not a raise); verify graceful fallback distinct from read_project_json |
read_waveform_json(project_id) | Same fallback pattern as read_speakers_json; verify independently |
write_speakers_json(project_id, data) | Data persistence; verify serialization and logging |
write_waveform_json(project_id, data) | Uses no indent unlike other writers; verify format difference |
save_sample(project_id, speaker_id, file_storage) | Sanitizes speaker_id (strips non-alphanumeric except _), raises ValueError on empty result |
list_project_dirs() | Filters by is_dir() AND project.json exists; verify both conditions and empty-directory case |
stream_file(path) | High priority. Complex Range header parsing, partial vs full content (206 vs 200), MIME type mapping; has a known bug (BUG_REPORT #20) |
application/projects.py
| Function | Why test |
|---|
_now() | Pure function; verify ISO 8601 UTC format |
_project_exists(project_id) | Pure path existence check |
list_projects() | Exception handling (bad project skipped, not fatal), sorting by modified date, data transformation |
get_project(project_id) | Enriches data with has_transcript flag; verify existence check and flag logic |
create_project(metadata, audio_file, waveform, transcript_file, sample_files) | High priority. UUID generation, directory setup, metadata normalization, conditional file saving for all optional params |
update_project(project_id, metadata, speakers, transcript_file, sample_files) | High priority. Three distinct conditional branches: metadata update, speakers update (with/without full dict), transcript/samples |
_normalise_speakers(speakers, project_id, sample_files, existing) | High priority. has_sample precedence: new uploads > client value > existing stored > false; verify all combinations |
duplicate_project(project_id) | New UUID, copied files, updated name and timestamps; verify metadata is fully correct |
get_speakers(project_id) / get_waveform(project_id) | Existence check + file read; verify both |
application/transcription/transcribe.py
| Function | Why test |
|---|
load_audio_dict(filepath) | Mono vs stereo branch (unsqueeze vs transpose), conditional resampling to 16kHz; shape handling is critical |
assign_speaker(segment_start, segment_end, annotation) | Overlap calculation; edge cases: no overlap, multiple speakers, boundary-exact matches |
Priority
Highest priority — security-critical or complex branching logic:
stream_file — known bug, complex Range header parsingcreate_project — core creation, many optional parametersupdate_project — three distinct conditional update paths_normalise_speakers — precedence rules for has_samplecheck_password / handle_login — authentication gate
Medium priority — data integrity and error handling:
login_required decoratorlist_projects — exception handling and sortingsave_sample — speaker ID sanitizationduplicate_project — UUID and metadata correctnessread_project_json / read_speakers_json — different error handling patternsload_audio_dict / assign_speaker — audio processing logic
Lower priority — simple but worth covering:
_now, _project_exists, project_dir, list_project_dirsget_project, get_speakers, get_waveformhandle_logoutwrite_* persistence functions
Integration
Flask route integration tests. Each test exercises auth → business logic → file I/O as a full stack, using a test client against a temp PROJECTS_DIR.
Auth routes
POST /auth/login
| Scenario | Expected |
|---|
| Correct password | 200 + {token} in response; token accepted by authenticated routes |
| Wrong password | 401 {error} |
Missing password field | 401 |
| Non-JSON body | 401 |
| Two sequential logins | Each returns a unique token; both tokens valid |
| Login → logout → login | New token works; old token rejected |
POST /auth/logout
| Scenario | Expected |
|---|
| Logout with valid token | 200; token rejected on next authenticated request |
| Logout without token | 200 (no error) |
| Logout twice | Both return 200; second is a no-op |
Project CRUD
GET /api/projects
| Scenario | Expected |
|---|
| No token | 401 |
| No projects on disk | 200 [] |
| One project | 200 array with that project |
| Multiple projects | Sorted by modified descending |
| Project created with transcript | has_transcript: true |
| Project created without transcript | has_transcript: false |
Corrupted project.json | That project skipped; others returned normally |
POST /api/projects
| Scenario | Expected |
|---|
| No token | 401 |
| Audio only | 201; directory created with audio.wav, project.json, speakers.json |
With metadata {"name":"X"} | Project name is "X" |
| With waveform JSON | waveform.json written with sampleRate, duration, peaks |
| With transcript file | transcript.csv saved; has_transcript: true |
| With speaker samples | samples/<id>.wav saved; has_sample: true in speakers.json |
Missing audio field | 400 |
Invalid JSON in metadata | 500 |
| Speaker ID with special chars in samples | Sanitized to alphanumeric+_; ValueError on empty result → 500 |
File exceeds MAX_AUDIO_SIZE_MB | 413 with custom error message |
GET /api/projects/<id>
| Scenario | Expected |
|---|
| No token | 401 |
| Existing project | 200 with all fields |
| Non-existent ID | 404 |
| After adding transcript via PUT | has_transcript: true |
PUT /api/projects/<id>
| Scenario | Expected |
|---|
| No token | 401 |
| Non-existent project | 404 |
| Update name only | 200; GET /<id> reflects new name; modified timestamp updated |
| Update speakers | speakers.json updated |
| Add transcript | transcript.csv saved; has_transcript: true |
| Add sample for existing speaker | has_sample flag updated in speakers.json |
metadata: null | No name change; only modified updated |
| Partial update (speakers only) | Only speakers.json touched; audio.wav unchanged |
DELETE /api/projects/<id>
| Scenario | Expected |
|---|
| No token | 401 |
| Existing project | 200 {ok: true}; GET /<id> returns 404; directory removed from disk |
| Non-existent ID | 404 |
| Delete twice | First 200; second 404 |
| List after delete | GET /api/projects no longer includes deleted project |
POST /api/projects/<id>/duplicate
| Scenario | Expected |
|---|
| No token | 401 |
| Non-existent source | 404 |
| Valid project | 201; new id differs from source; name has " (copy)" suffix |
| Timestamps | created and modified reset to duplication time |
| File completeness | Duplicate directory contains audio.wav, transcript.csv, samples/ matching source |
| Independence | Updating original name does not affect duplicate |
| Both in list | GET /api/projects returns both |
File serving
GET /api/projects/<id>/audio
| Scenario | Expected |
|---|
| No token | 401 |
| No audio file | 404 |
| Full download | 200; Content-Type: audio/wav; Content-Length set; body matches uploaded file |
| Range request (interior) | 206; Content-Range header; body is correct byte slice |
| Range request (open end) | 206; returns from offset to EOF |
| Range beyond file size | 206; end clamped to file_size - 1 |
| Malformed Range header | 416 |
Accept-Ranges header | Present on all responses |
GET /api/projects/<id>/transcript
| Scenario | Expected |
|---|
| No token | 401 |
| No transcript | 404 |
| Full download | 200; Content-Type: text/csv; body matches uploaded CSV |
| Range request | 206; partial CSV bytes returned |
GET /api/projects/<id>/samples/<speaker_id>
| Scenario | Expected |
|---|
| No token | 401 |
| Valid sample | 200; Content-Type: audio/wav; body matches uploaded WAV |
| Non-existent sample | 404 |
| Speaker ID with special chars | Sanitized; resolves to sanitized filename |
Path traversal attempt (../../../etc/passwd) | Sanitized to empty string → 404 |
| Underscore preserved | speaker_1 → samples/speaker_1.wav |
| Range request | 206; partial WAV bytes returned |
GET /api/projects/<id>/speakers
| Scenario | Expected |
|---|
| No token | 401 |
| Non-existent project | 404 |
| No speakers created | 200 {} |
| With speakers, no samples | has_sample: false for each |
| With speakers and samples | has_sample: true for sampled speakers |
| After PUT update | Reflects updated speaker data |
GET /api/projects/<id>/waveform
| Scenario | Expected |
|---|
| No token | 401 |
| Non-existent project | 404 |
| No waveform uploaded | 200 {} |
| With waveform | 200; has sampleRate, duration, peaks |
Cross-cutting
@login_required decorator
| Scenario | Expected |
|---|
Token in X-Auth-Token header | Accepted |
Token in ?token query param | Accepted |
| No token anywhere | 401 {error: "Authentication required"} |
| Invalid token | 401 |
| Token invalidated by logout | 401 on next use |
Request size limit
| Scenario | Expected |
|---|
Upload within MAX_AUDIO_SIZE_MB | 201 |
| Upload exceeding limit | 413 {error: "Request too large…"} |
Error response format
| Status | Body shape |
|---|
| 400 | {error: <message>} |
| 401 | {error: <message>} |
| 404 | {error: <message>} |
| 413 | {error: "Request too large — audio file exceeds the configured limit"} |
| 500 | {error: "Internal server error"} (real error in logs only) |