Pathways to SEASR Workshop 2009 - Day 1 - Textual analysis with UIMA and SEASR
One of the most interesting examples today was the textual analysis exercise using UIMA and SEASR. Mike Haberman presented some tools for doing analyzes & visualizations of sentiment in a book, using thesauri and a controlled vocabulary of emotions. It isn’t clear where UIMA ends and SEASR begins. My first question was, what is SEASR doing that one couldn’t - or wouldn’t want to - do with existing tools? Since then, I’ve had some time to play with the Meandre Workbench, which is the software that drives SEASR tools. It seems obvious now: the issue isn’t what SEASR tools can do, it is how readily the activities of scholarship can be shared and reused in other contexts. Raw notes from the UIMA portion:
Examples: SEASR and UIMA:
UIMA: Unstructured Information Management Applications
Example: Use UIMA to analyze Part-of-speech information.
• Uses CAS (Common Analysis Structure) to serialize data and pass it from analytical structure to analytical structure. Creates a chain of analytical components.
• UIMA is accessed through Eclipse as a plugin:
o XML description of the UIMA chain that will be run.
o Choose document analyzer
o Choose directory of files to analyze.
• Result is original datasource (i.e. text) + annotated parameters (parts of speech/terms/names, etc.)
•
SEASR application:
• Pattern analysis – discover how characters in a novel are mentioned in relation to commonly occurring nouns.
• Problem: handling sparse data sets – many possible nouns, very few in a given sentence.
• Result: create a matrix of characters and nouns, with confidence ranking.
Example: Use UIMA to analyze sentiment information.
• Look at adjectives within body of text.
• Use a thesaurus to chart path from an adjective to one of a predetermined set of emotional terms.
• Rank the closeness (number of steps) of adjectives to a given emotional term.
• Problems: different thesauri result in very different analyses.
• SEASR visualization based on open source ActionScript library: Flare.