MediaCategorizer

Intermediate File Formats

Author Tobias Kiertscher kiertscher@fh-brandenburg.de Brandenburg University of Applied Sciences
Datum 2014-04-29
Version 0.6.0

This document describes the most important data structures for the distillery component.

Configuration File

File in Clojure EDN syntax with the extension .cfg. The content is a map with the Configuration structure.

Configuration

The configuration structure controls the processing of the speech recognition results and the creation of the output.

Slots

Example

{ :parallel-proc true
  :blacklist-resource (resource "blacklist.txt"))
  :blacklist-max-size 1000
  :min-confidence 0.4
  :good-confidence 0.7
  :min-match-score 0.02
  :index-filter [:not-short :noun :min-confidence :no-punctuation]
  :skip-media-copy false
  :skip-word-includes false
  :skip-match-includes false
  :main-cloud { { :width 640
                  :height 400
                  :color [0.0 0.8 0.2]
                  ... } } 
  :category-cloud { ... }
  :medium-cloud { ... }
  :matrix { ... } }

Cloud Configuration

A cloud configuration controls the creation of a word cloud.

Slots

Example

{ :width 540
  :height 300
  :precision :medium
  :order-priority 0.6
  :font-family "Segoe UI"
  :font-style [:bold]
  :min-font-size 13
  :max-font-size 70
  :color [0.0 0.3 0.8 1.0]
  :background-color [0.0 0.0 0.0 0.0] }

Waveform Configuration

A waveform configuration controls the creation of waveform visualizations for a medium.

Slots

Example

{ :width 640
  :height 80 }

Matrix Configuration

The matrix configuration controls the creation of the match matrix between categories and media.

Slots

Example

{ :color [0.0 0.3 0.8 1.0] }

Job File

File in Clojure EDN syntax with file extension .saj. It is the input for the speech recognition result analysis and contains a Job Description structure. Part of a job is a name, a number of categories, a number of medium, and additional parameters like the output directory.

Job Description

A job description contains all information necessary to perform the analysis and create the analysis result representation.

Slots

Example

{ :media-categorizer-version "1.0.0"
  :job-name "Archive 001"
  :job-description "The first part of the media archive."
  :output-dir "C:\\Media\\Result"
  :result-file "result.xml"
  :configuration { ... } 
  :categories [ ... ] 
  :media [ ... ] }

Category Description

Defines a category and all associated resources.

Slots

Example

{ :id "comb"
  :name "Combination"
  :resources [ {:type :wikipedia, :url "http://en.wikipedia.org/wiki/Combination"}
               {:type :html, :url "http://mathworld.wolfram.com/Combination.html" :file "D:\\cache\\combination.html" }
               {:type :plain, :file "D:\\text\\combination.txt"} ] }

Category Resource Description

The reference to a category resource.

Slots

Medium Description

Defines a medium and all associated resources.

Slots

Example

{ :id "C1-P3-Intro"
  :name "Introduction"
  :medium-file "D:\\media\\c1\\p3_introduction.mp4"
  :medium-type :video
  :encoded-media-files [ {:mime-type "video/mp4" :path "D:\\media\\c1\\p3_introduction.mp4"} ]
  :recognition-profile ""
  :recognition-profile-name "en-US_female_03"
  :audio-file "D:\\media\\proc\\audio\\p3_introduction.wav"
  :waveform-file "D:\\media\\proc\\waveform\\p3_introduction.png"
  :waveform-file-bg "D:\\media\\proc\\waveform\\p3_introduction_2.png"
  :results-file "D:\\media\\proc\\transcript\\p3_introduction.srr" }

Medium File

The reference to an encoded media file. Encoded media files are prepared to be played with HTML5 video and audio elements inside a web browser.

Slots

Example

{ :path "D:\\media\\c1\\p3_introduction.mp4"
  :mime-type "video/mp4" }

Speech Recognition Result File

File in Clojure EDN syntax with file extension .srr. The content is a vector of Speech Recognition Results.

Example

[ { :no 0
    :start 0.3
    :duration 2.712
    :confidence 0.5651
    :text "Hello and welcome"
    :words [ { :no 0 :confidence 0.9544 :text "Hello" :lexical-form "hello" :pronunciation "həˈləʊ̯" }
             { :no 1 :confidence 0.8234 :text "and" :lexical-form "and" :pronunciation "ænd" }
             { :no 2 :confidence 0.8602 :text "welcome" :lexical-form "welcome" :pronunciation "ˈwɛl.kəm" } ]
    :alternates [ { :no 0
                    :confidence 0.3521
                    :text "Hello and welcome"
                    :words [ ... ] }
                  ... ] }
  ... ]

Phrase

A phrase is a sequence of recognized words.

Slots

Speech Recognition Result

A speech recognition result describes the result yielded by the speech recognition engine, analyzing a section of an audio stream. The analyzed section is typically selected by an algorithm which considers among others values like length of silence, background noises, and maximal length of a section. A speech recognition of an audio section yields a number of alternative phrases. The phrase with the highest confidence is typically used as the recognized phrase for the audio section. A speech recognition result is a Phrase as well.

Slots

Example

{ :no 0
  :start 24.35
  :duration 4.267
  :confidence 0.7885
  :text "a brown fox jumped over the messy hill."
  :words [ ... ]
  :alternates [ ... ] }

Alternate Phrase

An alternate sequence of recognized words for an audio section. An alternate sequence is an extension of the Phrase structure.

Slots

Example

{ :no 4
  :confidence 0.48992
  :text "a brown fox run about the messy mill."
  :words [ ... ] }

Recognized Word

A recognized word is a word in the context of a recognition result. Every word is recognized with a certain confidence. The confidence values of the words in a phrase can be combined to an overall confidence for a phrase.

Slots

Example

{ :no 2
  :confidence 0.6443
  :text "brown"
  :lexical-form "brown"
  :pronunciation "braʊn" }

Reverse Indexed Structures

Reverse indexing adds numerical references, pointing upwards in the hierarchy.

Reverse Indexed Word

The reverse indexing adds one additional slot to the Words in a recognized phrase of a result:

Reverse Indexed Alternate Phrase

The reverse indexing adds one additional slot to Alternate Phrases in a result:

Reverse Indexed Phrase Word

The reverse indexing adds two additional slots to Words in Alternate Phrases of a result:

Analysis Results

The analysis steps in distillery.core/prepare-and-analyze takes a Job Description, loads the necessary resources and generates additional data. This additional data is attached to the Job Description structure and its children.

Because of the large amount of the accumulated data during the analysis process the extended job description structure is not written out as it is as an EDN file. Instead the most important parts of the analysis result is written out as the XML result file.

Word Index

During analysis the Job Description is extended with a the slot :words, holding an index of words. A word index points from words to a number of occurrences in media phrases. It is encoded as a map with the lexical form of a word as the key and a structure with properties of the word as value.

Value Slots

The following properties of a word are possible.

Example

{ "Grammar" { :id "grammar"
              :lexical-form "Grammar"
              :pronunciation "..."
              :occurrences [ { :medium-id "video1" :result-no 20 :word-no 3 :confidence 0.679 } ... ]
              :occurrence-count 3
              :mean-confidence 0.7533 }
  "Method"  { :id "method"
              :lexical-form "Method"
              :pronunciation "..."
              :occurrences [ ... ] 
              :occurrence-count 1
              :mean-confidence 0.895 }

Index Statistics

For computational reasons, an index map can be supported by some statistical values.

Slots

Example

{ :count 233
  :max-occurrence-count: 35 } 

Medium Word Occurrence

A medium word occurrence is the address to a recognized word in a medium.

Slots

Example

{ :medium-id "video1" 
  :result-no 20 
  :word-no 3 
  :confidence 0.679 }

Medium Result

The analysis result for the recognized phrases of a medium extends the Medium Description structure. The original Medium Description from the vector in the :media slot of a Job Description structure is extended by the following slots:

Slots

Example

{ :id "video01"
  :name "The first medium"
  ...
  :results [ ... ]
  :phrase-count 311
  :word-count 4977
  :index { "alpha" { :mean-confidence 0.793
                     :occurrences [ ... ]
                     :occurrence-count 9
                     :match-value: 0.104 }
           "beta"  { ... }
           ... }
  :index-stats { :count 288
                 :max-occurrence-count 19 } 
  :matches { "info" { :category-id "info"
                      :word-scores { ... }
                      :score 0.7133 }
             "math" { ... }
             ... }
  :max-score 0.9323 }

Medium Word

A medium word is a word which occurs at least one time in the recognized phrases of a medium. Every medium word is represented by some statistical values and a list of Medium Word Occurrences.

Slots

Example

{ :id "bioinformatik"
  :lexical-form "Bioinformatik"
  :pronunciation "biːoːiːnfɔ͡ɐmaːˈtʰɪk"
  :occurrences [ {:medium-id "ABC"
                  :result-no 3
                  :word-no 10
                  :confidence 0.689}
                 ... ]
  :occurrence-count 21
  :mean-confidence 0.8234
  :match-value 0.01522 }

Category Match

The category match describes the matching result between a medium and a category.

Slots

Example

{ :category-id "math"
  :word-scores { "algorithm" 0.00315
                 "linear" 2.4552E-4
                 ... } }

Category Result

The analysis result for the words of a category extends the Category Description structure. The original Category Description from the vector in the :categories slot of a Job Description structure is extended by the following slots:

Slots

Example

{ :id "math"
  :name "Mathematics"
  ...
  :words [ ... ]
  :index { "gamma" { :mean-confidence 1
                     :occurrences [ ... ]
                     :occurrence-count 4
                     :match-value: 0.0281 }
           "kappa"  { ... }
           ... }
  :index-stats { :count 1022
                 :max-occurrence-count 56 } 
  :matches { "video01" { :medium-id "video01"
                         :word-scores { ... }
                         :score 0.3991 }
             "video02" { ... }
             ... }
  :max-score 0.8554 }

Descriptive Word

A descriptive word is a word, describing a category as part of the category resources. The category resources are filtered to sequences of tokens and then concatenated to one token sequence per category. A descriptive word is a token in this sequence. The structure of a descriptive word is somehow similar to the Recognized Word structure

Slots

Example

{ :no 455
  :text "way"
  :lexical-confidence "way"
  :confidence 1 }

Category Word

A category word is a word which occurs at least one time in the category resources. Every category word is represented by some statistical values and a list of Category Word Occurrences.

Slots

Example

{ :id "bioinformatik"
  :lexical-form "Bioinformatik"
  :occurrences [ {:category-id "biology"
                  :no 466
                  :confidence 1}
                 ... ]
  :occurrence-count 21
  :mean-confidence 0.8234
  :match-value 0.01522 }

Medium Match

The medium match describes the matching result between a category and a medium.

Slots

Example

{ :medium-id "video08"
  :word-scores { "hair" 0.00023
                 "tube" 1.6882E-4
                 ... } }