4.4.1
This is a minor upgrade with important new support for image/audio/video-linked corpora, Whisper transcription, AI model management improvements, and bug fixes.
New Features
- Image/audio/video (IAV) corpus support has been added. Corpus databases can now store media asset metadata, document-to-media mappings, and timed segment alignments using the new `iav_assets`, `iav_doc_mapping`, and `iav_segments` tables.
- Raw file corpus building can now auto-detect and attach media files to text documents. Text files can be matched with image, audio, and video files by stem, and SRT subtitle files can be converted into text documents with timed media alignment.
- YouTube-backed and local video/audio SRT workflows have been added, including support for sidecar metadata and helper scripts for building IAV-compatible corpora from text, image, audio, video, SRT, and YouTube transcript sources.
- Whisper transcription support has been added for audio-only corpus creation. Audio files can be transcribed through the bundled Whisper runtime and converted into aligned text corpora.
- KWIC results can now expose linked media for compatible corpora. Image, audio, local video, and YouTube media dialogs have been added, along with image gallery support for image-linked corpora.
- Downloadable Whisper models have been added to the AI Model Manager, including English and multilingual model options in several sizes. ### Improvements
- Whisper model entries in the AI Model Manager now use clearer display names and include use-case hints in the interface.
- AI model selection and synchronization behavior has been improved, including separate selection handling for local language models and Whisper models.
- AI model display grouping has been improved so language models and Whisper models are shown in clearer sections.
- The bundled local AI runtime has been updated, and bundled runtime resource initialization has been improved for different Windows distribution types.
- Corpus file operation errors now provide clearer feedback.
- Corpus creation and media matching now provide clearer warnings when files cannot be matched or when stems are ambiguous.
- Metadata filtering and document filter integration have been improved across tools, including target/reference role handling and filtered document counts.
- Statistical functions have been refactored to handle zero counts, empty expected values, and invalid contingency tables more safely.
- Search and result generation internals have been updated across KWIC, Plot, File View, Cluster, N-Gram, Collocate, Word List, and Keyword List to work better with filtered/subcorpus workflows and the newer query planning code.
Bug Fixes
- Normalized value display issues in the Plot and Cluster tools were fixed.
- Cluster tool norm range handling now correctly uses the document filter manager and filtered file counts.
- Legacy caching logic that no longer matched the current subcorpus architecture was removed.
- Several word list corpus builder workflows now provide better validation and warning messages for missing, ambiguous, or invalid source files.
- SRT processing and conversion handling have been improved for more reliable transcript text generation.
- Path handling for writable resources, media files, and bundled runtime files has been improved.
- Release packaging scripts and generated release metadata have been updated for the 4.4.1 build flow.