This document describes the most important data structures for the distillery component.
File in Clojure EDN syntax with the extension .cfg
. The content is a map with the Configuration structure.
The configuration structure controls the processing of the speech recognition results and the creation of the output.
[0..1]
the minimal recognition confidence for a word to be used in the statistic analysis[0..1]
the minimal relative appearance (max-appearance / appearance
) for correction candidates[0..1]
the minimal matching score for associating a medium with a category:not-short
, :noun
, :min-confidence
, :good-confidence
, :no-punctuation
, :not-in-blacklist
{ :parallel-proc true
:blacklist-resource (resource "blacklist.txt"))
:blacklist-max-size 1000
:min-confidence 0.4
:good-confidence 0.7
:min-match-score 0.02
:index-filter [:not-short :noun :min-confidence :no-punctuation]
:skip-media-copy false
:skip-word-includes false
:skip-match-includes false
:main-cloud { { :width 640
:height 400
:color [0.0 0.8 0.2]
... } }
:category-cloud { ... }
:medium-cloud { ... }
:matrix { ... } }
A cloud configuration controls the creation of a word cloud.
[0..1]
the precision for finding a place for a word in the word cloud[0..1]
the priority of building an alphabetic order in the word cloud:bold
, :italic
[10..]
the font size of the smallest words in pixels[10..]
the font size of the largest words in pixels[0..1]
for red, green, and blue, and alpha[0..1]
for red, green, blue, and alpha{ :width 540
:height 300
:precision :medium
:order-priority 0.6
:font-family "Segoe UI"
:font-style [:bold]
:min-font-size 13
:max-font-size 70
:color [0.0 0.3 0.8 1.0]
:background-color [0.0 0.0 0.0 0.0] }
A waveform configuration controls the creation of waveform visualizations for a medium.
{ :width 640
:height 80 }
The matrix configuration controls the creation of the match matrix between categories and media.
[0..1]
for red, green, blue, and alpha{ :color [0.0 0.3 0.8 1.0] }
File in Clojure EDN syntax with file extension .saj
. It is the input for the speech recognition result analysis and contains a Job Description structure. Part of a job is a name, a number of categories, a number of medium, and additional parameters like the output directory.
A job description contains all information necessary to perform the analysis and create the analysis result representation.
{ :media-categorizer-version "1.0.0"
:job-name "Archive 001"
:job-description "The first part of the media archive."
:output-dir "C:\\Media\\Result"
:result-file "result.xml"
:configuration { ... }
:categories [ ... ]
:media [ ... ] }
Defines a category and all associated resources.
{ :id "comb"
:name "Combination"
:resources [ {:type :wikipedia, :url "http://en.wikipedia.org/wiki/Combination"}
{:type :html, :url "http://mathworld.wolfram.com/Combination.html" :file "D:\\cache\\combination.html" }
{:type :plain, :file "D:\\text\\combination.txt"} ] }
The reference to a category resource.
:plain
| :html
| :wikipedia
the text type of the resourceDefines a medium and all associated resources.
:audio
| :video
| :unknown
the type of the medium.wav
(PCM 16bit mono), used for speech recognition*.srr
{ :id "C1-P3-Intro"
:name "Introduction"
:medium-file "D:\\media\\c1\\p3_introduction.mp4"
:medium-type :video
:encoded-media-files [ {:mime-type "video/mp4" :path "D:\\media\\c1\\p3_introduction.mp4"} ]
:recognition-profile ""
:recognition-profile-name "en-US_female_03"
:audio-file "D:\\media\\proc\\audio\\p3_introduction.wav"
:waveform-file "D:\\media\\proc\\waveform\\p3_introduction.png"
:waveform-file-bg "D:\\media\\proc\\waveform\\p3_introduction_2.png"
:results-file "D:\\media\\proc\\transcript\\p3_introduction.srr" }
The reference to an encoded media file. Encoded media files are prepared to be played with HTML5 video and audio elements inside a web browser.
{ :path "D:\\media\\c1\\p3_introduction.mp4"
:mime-type "video/mp4" }
File in Clojure EDN syntax with file extension .srr
. The content is a vector of Speech Recognition Results.
[ { :no 0
:start 0.3
:duration 2.712
:confidence 0.5651
:text "Hello and welcome"
:words [ { :no 0 :confidence 0.9544 :text "Hello" :lexical-form "hello" :pronunciation "həˈləʊ̯" }
{ :no 1 :confidence 0.8234 :text "and" :lexical-form "and" :pronunciation "ænd" }
{ :no 2 :confidence 0.8602 :text "welcome" :lexical-form "welcome" :pronunciation "ˈwɛl.kəm" } ]
:alternates [ { :no 0
:confidence 0.3521
:text "Hello and welcome"
:words [ ... ] }
... ] }
... ]
A phrase is a sequence of recognized words.
[0..1]
describing the overall confidence of this phraseA speech recognition result describes the result yielded by the speech recognition engine, analyzing a section of an audio stream. The analyzed section is typically selected by an algorithm which considers among others values like length of silence, background noises, and maximal length of a section. A speech recognition of an audio section yields a number of alternative phrases. The phrase with the highest confidence is typically used as the recognized phrase for the audio section. A speech recognition result is a Phrase as well.
[0..n]
identifying the result in the context of a medium[0..1]
describing the overall confidence of the recognized phrase for the audio section{ :no 0
:start 24.35
:duration 4.267
:confidence 0.7885
:text "a brown fox jumped over the messy hill."
:words [ ... ]
:alternates [ ... ] }
An alternate sequence of recognized words for an audio section. An alternate sequence is an extension of the Phrase structure.
[0..n]
identifying the phrase in the context of a Speech Recognition Result[0..1]
describing the overall confidence of this phrase{ :no 4
:confidence 0.48992
:text "a brown fox run about the messy mill."
:words [ ... ] }
A recognized word is a word in the context of a recognition result. Every word is recognized with a certain confidence. The confidence values of the words in a phrase can be combined to an overall confidence for a phrase.
[0..n]
identifying the word in a phrase[0..1]
describing the confidence for the recognition of this word{ :no 2
:confidence 0.6443
:text "brown"
:lexical-form "brown"
:pronunciation "braʊn" }
Reverse indexing adds numerical references, pointing upwards in the hierarchy.
The reverse indexing adds one additional slot to the Words in a recognized phrase of a result:
[0..n]
identifying the result containing this wordThe reverse indexing adds one additional slot to Alternate Phrases in a result:
[0..n]
identifying the result containing this phraseThe reverse indexing adds two additional slots to Words in Alternate Phrases of a result:
[0..n]
identifying the result containing the phrase[0..n]
identifying the alternate phrase in the resultThe analysis steps in distillery.core/prepare-and-analyze takes a Job Description, loads the necessary resources and generates additional data. This additional data is attached to the Job Description structure and its children.
Because of the large amount of the accumulated data during the analysis process the extended job description structure is not written out as it is as an EDN file. Instead the most important parts of the analysis result is written out as the XML result file.
During analysis the Job Description is extended with a the slot :words
, holding an index of words. A word index points from words to a number of occurrences in media phrases. It is encoded as a map with the lexical form of a word as the key and a structure with properties of the word as value.
The following properties of a word are possible.
[1..n]
the number of occurrences (cached for easy accessibility)[0..1]
mean recognition confidence{ "Grammar" { :id "grammar"
:lexical-form "Grammar"
:pronunciation "..."
:occurrences [ { :medium-id "video1" :result-no 20 :word-no 3 :confidence 0.679 } ... ]
:occurrence-count 3
:mean-confidence 0.7533 }
"Method" { :id "method"
:lexical-form "Method"
:pronunciation "..."
:occurrences [ ... ]
:occurrence-count 1
:mean-confidence 0.895 }
For computational reasons, an index map can be supported by some statistical values.
[0..n]
the number of words in the index[1..n]
the number of occurrences of the most frequent word{ :count 233
:max-occurrence-count: 35 }
A medium word occurrence is the address to a recognized word in a medium.
[0..1]
recognition confidence{ :medium-id "video1"
:result-no 20
:word-no 3
:confidence 0.679 }
The analysis result for the recognized phrases of a medium extends the Medium Description structure. The original Medium Description from the vector in the :media
slot of a Job Description structure is extended by the following slots:
[1..n]
the number of recognized phrases (cached for easy accessibility)[1..n]
the number of recognized words[0..1]
the maximal score between the medium and a category{ :id "video01"
:name "The first medium"
...
:results [ ... ]
:phrase-count 311
:word-count 4977
:index { "alpha" { :mean-confidence 0.793
:occurrences [ ... ]
:occurrence-count 9
:match-value: 0.104 }
"beta" { ... }
... }
:index-stats { :count 288
:max-occurrence-count 19 }
:matches { "info" { :category-id "info"
:word-scores { ... }
:score 0.7133 }
"math" { ... }
... }
:max-score 0.9323 }
A medium word is a word which occurs at least one time in the recognized phrases of a medium. Every medium word is represented by some statistical values and a list of Medium Word Occurrences.
[1..n]
the number of occurrences (cached for easy accessibility)[0..1]
mean recognition confidence[0..1]
the weight of this word in the context of the medium{ :id "bioinformatik"
:lexical-form "Bioinformatik"
:pronunciation "biːoːiːnfɔ͡ɐmaːˈtʰɪk"
:occurrences [ {:medium-id "ABC"
:result-no 3
:word-no 10
:confidence 0.689}
... ]
:occurrence-count 21
:mean-confidence 0.8234
:match-value 0.01522 }
The category match describes the matching result between a medium and a category.
[0..1]
the matching score between the medium and the category{ :category-id "math"
:word-scores { "algorithm" 0.00315
"linear" 2.4552E-4
... } }
The analysis result for the words of a category extends the Category Description structure. The original Category Description from the vector in the :categories
slot of a Job Description structure is extended by the following slots:
[0..1]
the maximal score between the category and a medium{ :id "math"
:name "Mathematics"
...
:words [ ... ]
:index { "gamma" { :mean-confidence 1
:occurrences [ ... ]
:occurrence-count 4
:match-value: 0.0281 }
"kappa" { ... }
... }
:index-stats { :count 1022
:max-occurrence-count 56 }
:matches { "video01" { :medium-id "video01"
:word-scores { ... }
:score 0.3991 }
"video02" { ... }
... }
:max-score 0.8554 }
A descriptive word is a word, describing a category as part of the category resources. The category resources are filtered to sequences of tokens and then concatenated to one token sequence per category. A descriptive word is a token in this sequence. The structure of a descriptive word is somehow similar to the Recognized Word structure
[0..n]
the position of the word in the token sequence1
because a descriptive word is part of the category resources and therefore has no recognition confidence{ :no 455
:text "way"
:lexical-confidence "way"
:confidence 1 }
A category word is a word which occurs at least one time in the category resources. Every category word is represented by some statistical values and a list of Category Word Occurrences.
[1..n]
the number of occurrences (cached for easy accessibility)1
because all category word occurrences have the confidence of 1
[0..1]
the weight of this word in the context of the category{ :id "bioinformatik"
:lexical-form "Bioinformatik"
:occurrences [ {:category-id "biology"
:no 466
:confidence 1}
... ]
:occurrence-count 21
:mean-confidence 0.8234
:match-value 0.01522 }
The medium match describes the matching result between a category and a medium.
[0..1]
the matching score between the category and the medium{ :medium-id "video08"
:word-scores { "hair" 0.00023
"tube" 1.6882E-4
... } }