UNIT_TESTING

Unit Testing Recommendations

Frontend

Functions across static/js that should have unit tests, organized by file.


utilities/tools.js

FunctionWhy test
generateId()Pure function; uniqueness/collision avoidance is critical
formatTime(s)Pure time formatter; edge cases (0, large values, non-integers)
formatTimeMs(s)Same as above with decimal
hexToRgb(hex)Pure color parser; input validation and parsing correctness
hslToHex(h, s, l)Color space conversion; correctness depends on math
hexToHue(hex)Inverse of hslToHex; round-trip consistency
colorToHSV(color)Multi-format color parsing; multiple code paths to test
parseCSV(text)Critical parsing logic; RFC 4180 quoting, escaped quotes, embedded newlines
farthestColor(colors)Delegates to farthestAngle; integration-level test
farthestAngle(angles)Circular geometry; edge cases (empty, one, wrap-around at 360)

utilities/audio.js

FunctionWhy test
encodeMonoWav(samples, sampleRate)Binary format encoding; WAV header correctness is precise
loadAudioFile(file)Returns structured data; testable with mock AudioContext
decodeSampleArrayBuffer(arrayBuffer)Structured data transform; testable with mock AudioContext
extractPeaks(channelBuffers, peakCount)Downsampling logic; results depend on math

project.jsTranscript class

FunctionWhy test
#buildParagraphs(segments)Core grouping logic; gap threshold behavior has multiple code paths
#buildSpeakerBlocks(paragraphs)Groups consecutive same-speaker paragraphs
compileCSV()Serialization; must match parseCSV() (round-trip test)
splitSegment(idx, a, b)Data mutation; result must preserve all data correctly
mergeSegments(idxA, idxB)Data mutation; text concatenation and index bookkeeping
changeSpeaker(segIdx, newId)Reassigns data field, triggers rebuild
segmentAtTime(time)Binary search; edge cases (before start, after end, in gap)
clone(deep)Deep vs. shallow copy correctness

project.jsProject class

FunctionWhy test
loadTranscriptCSV(text, local)Parses CSV and auto-creates speakers; complex orchestration
addSpeaker(...)Auto-generates hue using farthestAngle; color uniqueness
reassignSegments(fromId, toId)Bulk mutation; all segments must be reassigned
metadata()Serialization for server upload; field correctness
isDirty()Returns true only when expected dirty flags are set
markAllDirty(dirty) / markTranscriptDirty() etc.Flag state transitions
packageProject(filepath)ZIP content validation (what's included, filenames)
unpackageProject(zipFile)Inverse of packageProject; round-trip integrity

workspace_panels/waveform_panel.js

FunctionWhy test
clientXToTime(clientX)Pixel-to-time coordinate conversion; math-heavy
basePxPerSec()Scaling factor calculation; affects all coordinate math
regionIndexAtX(clientX)Viewport coordinate to segment index
regionIndexAtTime(time)Time to segment index
skipN(n)Seek clamping to [0, duration]
zoomIn(amount) / zoomOut(amount) / applyZoom(level)Zoom bounds clamping to [MIN_ZOOM, MAX_ZOOM]

workspace_panels/speakers_panel.js

FunctionWhy test
addSpeaker(id)Auto-generates SPEAKER_N IDs with collision avoidance
speakerDisplayName(id)Name fallback to ID; simple but branching
sliceSegmentAsSample(id, seg)Audio extraction and sample rate handling

server.js

FunctionWhy test
audioUrl(id)URL construction; must match server expectations
transcriptUrl(id)URL construction; must match server expectations
sampleUrl(id, spkId)URL construction; must match server expectations

Priority

Highest priority — pure logic, no DOM, high business impact:

  1. parseCSV / compileCSV round-trip
  2. Transcript grouping (#buildParagraphs, #buildSpeakerBlocks)
  3. farthestAngle (used for speaker color assignment globally)
  4. segmentAtTime (used by waveform sync)
  5. encodeMonoWav (WAV binary format correctness)
  6. formatTime / formatTimeMs

Medium priority — data integrity, some setup required:

  • addSpeaker with hue auto-generation
  • mergeSegments / splitSegment
  • packageProject / unpackageProject round-trip
  • clientXToTime and coordinate conversion functions
  • colorToHSV, hslToHex, hexToHue color pipeline

Lower priority — require mocking AudioContext or Web Workers:

  • encodeMonoWav, extractPeaks, loadAudioFile, decodeSampleArrayBuffer
  • sliceSegmentAsSample

Backend

Functions across app.py and application/ that should have unit tests, organized by file.


application/auth.py

FunctionWhy test
check_password(password)Security-critical; timing-safe comparison must reject wrong passwords without leaking timing info
handle_login()Multiple paths: correct password generates token, wrong password rejects; token is added to valid set
login_required(f)Decorator gate; token from header vs query param, valid vs invalid token — all paths must be covered
handle_logout()Token must be removed from valid set; missing token must not error

application/files.py

FunctionWhy test
project_dir(project_id)Pure path construction; verify correct path for various project ID formats
read_project_json(project_id)Raises FileNotFoundError if missing; verify error handling path vs success path
write_project_json(project_id, data)JSON serialization; verify indent, file creation, content correctness
read_speakers_json(project_id)Returns {} if missing (not a raise); verify graceful fallback distinct from read_project_json
read_waveform_json(project_id)Same fallback pattern as read_speakers_json; verify independently
write_speakers_json(project_id, data)Data persistence; verify serialization and logging
write_waveform_json(project_id, data)Uses no indent unlike other writers; verify format difference
save_sample(project_id, speaker_id, file_storage)Sanitizes speaker_id (strips non-alphanumeric except _), raises ValueError on empty result
list_project_dirs()Filters by is_dir() AND project.json exists; verify both conditions and empty-directory case
stream_file(path)High priority. Complex Range header parsing, partial vs full content (206 vs 200), MIME type mapping; has a known bug (BUG_REPORT #20)

application/projects.py

FunctionWhy test
_now()Pure function; verify ISO 8601 UTC format
_project_exists(project_id)Pure path existence check
list_projects()Exception handling (bad project skipped, not fatal), sorting by modified date, data transformation
get_project(project_id)Enriches data with has_transcript flag; verify existence check and flag logic
create_project(metadata, audio_file, waveform, transcript_file, sample_files)High priority. UUID generation, directory setup, metadata normalization, conditional file saving for all optional params
update_project(project_id, metadata, speakers, transcript_file, sample_files)High priority. Three distinct conditional branches: metadata update, speakers update (with/without full dict), transcript/samples
_normalise_speakers(speakers, project_id, sample_files, existing)High priority. has_sample precedence: new uploads > client value > existing stored > false; verify all combinations
duplicate_project(project_id)New UUID, copied files, updated name and timestamps; verify metadata is fully correct
get_speakers(project_id) / get_waveform(project_id)Existence check + file read; verify both

application/transcription/transcribe.py

FunctionWhy test
load_audio_dict(filepath)Mono vs stereo branch (unsqueeze vs transpose), conditional resampling to 16kHz; shape handling is critical
assign_speaker(segment_start, segment_end, annotation)Overlap calculation; edge cases: no overlap, multiple speakers, boundary-exact matches

Priority

Highest priority — security-critical or complex branching logic:

  1. stream_file — known bug, complex Range header parsing
  2. create_project — core creation, many optional parameters
  3. update_project — three distinct conditional update paths
  4. _normalise_speakers — precedence rules for has_sample
  5. check_password / handle_login — authentication gate

Medium priority — data integrity and error handling:

  • login_required decorator
  • list_projects — exception handling and sorting
  • save_sample — speaker ID sanitization
  • duplicate_project — UUID and metadata correctness
  • read_project_json / read_speakers_json — different error handling patterns
  • load_audio_dict / assign_speaker — audio processing logic

Lower priority — simple but worth covering:

  • _now, _project_exists, project_dir, list_project_dirs
  • get_project, get_speakers, get_waveform
  • handle_logout
  • write_* persistence functions

Integration

Flask route integration tests. Each test exercises auth → business logic → file I/O as a full stack, using a test client against a temp PROJECTS_DIR.


Auth routes

POST /auth/login

ScenarioExpected
Correct password200 + {token} in response; token accepted by authenticated routes
Wrong password401 {error}
Missing password field401
Non-JSON body401
Two sequential loginsEach returns a unique token; both tokens valid
Login → logout → loginNew token works; old token rejected

POST /auth/logout

ScenarioExpected
Logout with valid token200; token rejected on next authenticated request
Logout without token200 (no error)
Logout twiceBoth return 200; second is a no-op

Project CRUD

GET /api/projects

ScenarioExpected
No token401
No projects on disk200 []
One project200 array with that project
Multiple projectsSorted by modified descending
Project created with transcripthas_transcript: true
Project created without transcripthas_transcript: false
Corrupted project.jsonThat project skipped; others returned normally

POST /api/projects

ScenarioExpected
No token401
Audio only201; directory created with audio.wav, project.json, speakers.json
With metadata {"name":"X"}Project name is "X"
With waveform JSONwaveform.json written with sampleRate, duration, peaks
With transcript filetranscript.csv saved; has_transcript: true
With speaker samplessamples/<id>.wav saved; has_sample: true in speakers.json
Missing audio field400
Invalid JSON in metadata500
Speaker ID with special chars in samplesSanitized to alphanumeric+_; ValueError on empty result → 500
File exceeds MAX_AUDIO_SIZE_MB413 with custom error message

GET /api/projects/<id>

ScenarioExpected
No token401
Existing project200 with all fields
Non-existent ID404
After adding transcript via PUThas_transcript: true

PUT /api/projects/<id>

ScenarioExpected
No token401
Non-existent project404
Update name only200; GET /<id> reflects new name; modified timestamp updated
Update speakersspeakers.json updated
Add transcripttranscript.csv saved; has_transcript: true
Add sample for existing speakerhas_sample flag updated in speakers.json
metadata: nullNo name change; only modified updated
Partial update (speakers only)Only speakers.json touched; audio.wav unchanged

DELETE /api/projects/<id>

ScenarioExpected
No token401
Existing project200 {ok: true}; GET /<id> returns 404; directory removed from disk
Non-existent ID404
Delete twiceFirst 200; second 404
List after deleteGET /api/projects no longer includes deleted project

POST /api/projects/<id>/duplicate

ScenarioExpected
No token401
Non-existent source404
Valid project201; new id differs from source; name has " (copy)" suffix
Timestampscreated and modified reset to duplication time
File completenessDuplicate directory contains audio.wav, transcript.csv, samples/ matching source
IndependenceUpdating original name does not affect duplicate
Both in listGET /api/projects returns both

File serving

GET /api/projects/<id>/audio

ScenarioExpected
No token401
No audio file404
Full download200; Content-Type: audio/wav; Content-Length set; body matches uploaded file
Range request (interior)206; Content-Range header; body is correct byte slice
Range request (open end)206; returns from offset to EOF
Range beyond file size206; end clamped to file_size - 1
Malformed Range header416
Accept-Ranges headerPresent on all responses

GET /api/projects/<id>/transcript

ScenarioExpected
No token401
No transcript404
Full download200; Content-Type: text/csv; body matches uploaded CSV
Range request206; partial CSV bytes returned

GET /api/projects/<id>/samples/<speaker_id>

ScenarioExpected
No token401
Valid sample200; Content-Type: audio/wav; body matches uploaded WAV
Non-existent sample404
Speaker ID with special charsSanitized; resolves to sanitized filename
Path traversal attempt (../../../etc/passwd)Sanitized to empty string → 404
Underscore preservedspeaker_1samples/speaker_1.wav
Range request206; partial WAV bytes returned

GET /api/projects/<id>/speakers

ScenarioExpected
No token401
Non-existent project404
No speakers created200 {}
With speakers, no sampleshas_sample: false for each
With speakers and sampleshas_sample: true for sampled speakers
After PUT updateReflects updated speaker data

GET /api/projects/<id>/waveform

ScenarioExpected
No token401
Non-existent project404
No waveform uploaded200 {}
With waveform200; has sampleRate, duration, peaks

Cross-cutting

@login_required decorator

ScenarioExpected
Token in X-Auth-Token headerAccepted
Token in ?token query paramAccepted
No token anywhere401 {error: "Authentication required"}
Invalid token401
Token invalidated by logout401 on next use

Request size limit

ScenarioExpected
Upload within MAX_AUDIO_SIZE_MB201
Upload exceeding limit413 {error: "Request too large…"}

Error response format

StatusBody shape
400{error: <message>}
401{error: <message>}
404{error: <message>}
413{error: "Request too large — audio file exceeds the configured limit"}
500{error: "Internal server error"} (real error in logs only)