Archive for January, 2009

Joining the Brown University Library…

It’s official: we (the Scholarly Technology Group and Women Writers Project) are moving to the Brown University Library.  Having come from the library world, I’m pretty excited to be returning as part of a group that explores areas of digital scholarship critical to the future of libraries.  More than that, we’ll have the opportunity to work even more closely with the wonderful and talented people at Brown’s Center for Digital Initiatives.  All changes can be stressful, but we have a lot of reason to be excited about this move, and we’re looking forward to it.

Comments

Pathways to SEASR Workshop 2009 - Day 2 & Wrap up

Day 2 at the SEASR Workshop focused more on potential implementations of SEASR tools in existing projects (e.g., MONK, DISCUS, Vue, etc.).  Most of the presentations were short on particulars, but talked a lot about expanding on current & planned functionality with SEASR tools.  To SEASR’s credit, none of the speculative talk seemed far-fetched - a lot of big ideas suddenly seem much closer to realization.  But, the strength of SEASR lies in the ability to share code (’components’, in SEASR lingo), and the ability to share is contingent on the emergence of a lively community around these tools.  The SEASR folks reported on a Community hub initiative, though it wasn’t ready.

There has been a lot of discussion among the participants about whether SEASR does anything that other pipeline languages/java/frameworks don’t already do.  And I think there is more to hear from the SEASR folks regarding why they developed yet another framework and language to support the real meat of the project: the repository of shared DH-centric code.  Some participants voiced concerns and/or hopes that Mellon will mutate SEASR into the as-of-yet unrealized software layer for the Bamboo Project.  Whether or not this happens, there is a solid sandbox here, and if the SEASR folks maintain some momentum with more forthcoming workshops, actively making their team available to practitioners, and making their support for an active user community paramount, there is a ton of potential here.

Comments

Pathways to SEASR Workshop 2009 - Day 1, Cont’d

John Unsworth just mentioned that the Google Books data will likely be housed at UIUC as part of the HathiTrust.  Since Google settled their lawsuit with publishers, they are required to provide computational access to the material.  He sees SEASR as an infrastructure for beginning that analysis.  If it turns out to be as accessible as we hope, it could be an incredibly valuable resource.

Comments

Pathways to SEASR Workshop 2009 - Day 1 - Textual analysis with UIMA and SEASR

One of the most interesting examples today was the textual analysis exercise using UIMA and SEASR.  Mike Haberman presented some tools for doing analyzes & visualizations of sentiment in a book, using thesauri and a controlled vocabulary of emotions.  It isn’t clear where UIMA ends and SEASR begins.  My first question was, what is SEASR doing that one couldn’t - or wouldn’t want to - do with existing tools?  Since then, I’ve had some time to play with the Meandre Workbench, which is the software that drives SEASR tools.  It seems obvious now: the issue isn’t what SEASR tools can do, it is how readily the activities of scholarship can be shared and reused in other contexts.  Raw notes from the UIMA portion:
Examples:  SEASR and UIMA:

UIMA: Unstructured Information Management Applications

Example: Use UIMA to analyze Part-of-speech information.
•    Uses CAS (Common Analysis Structure) to serialize data and pass it from analytical structure to analytical structure.   Creates a chain of analytical components.
•    UIMA is accessed through Eclipse as a plugin:

o    XML description of the UIMA chain that will be run.
o    Choose document analyzer
o    Choose directory of files to analyze.

•    Result is original datasource (i.e. text) + annotated parameters (parts of speech/terms/names, etc.)

SEASR application:
•    Pattern analysis – discover how characters in a novel are mentioned in relation to commonly occurring nouns.
•    Problem: handling sparse data sets – many possible nouns, very few in a given sentence.
•    Result: create a matrix of characters and nouns, with confidence ranking.

Example: Use UIMA to analyze sentiment information.
•    Look at adjectives within body of text.
•    Use a thesaurus to chart path from an adjective to one of a predetermined set of emotional terms.
•    Rank the closeness (number of steps) of adjectives to a given emotional term.
•    Problems: different thesauri result in very different analyses.
•    SEASR visualization based on open source ActionScript library: Flare.

Comments

Pathways to SEASR Workshop 2009 - Day 1 - Zotero and SEASR

The first example was using SEASR to analyze Zotero collection metadata.  SEASR is accessible through a Firefox plugin in the contextual menu.  This was just a proof-of-concept, and maybe not the most compelling example, but since Zotero is so popular it made sense.  Raw notes:

Integrating SEASR analytics with common scholarly tools

Examples: Zotero and SEASR:

•    Send collections to SEASR and get some analytics.
•    Examples: Authorship analysis

o    Author centrality analysis
o    Author Degree analysis
o    Author HITS analysis

•    Able to point the Zotero/SEASR plugin to your institutional SEASR services to get the kinds of analytics you want.
•    This is purely a service – no indexing to reuse previously analyzed data.

Example 1:
•    SEASR will only do the analysis.  Analytic results will be stored in the Zotero database as an attachment.
•    SEASR plug-in in context menu – Option to compute Flesch-Kincaid readability on a Zotero entry via Project Gutenberg.

Example 2:
•    Look for author centrality in a collection of articles saved in Zotero.  Returns centrality ranking and saves analysis as a Zotero attachment.

Example 3:
•    Export collections from Zotero to Fedora via SEASR.  (Is Fedora a useful way to share personal collections?).

Comments

Pathways to SEASR Workshop 2009 - Day 1

The NCSA has Supercomputing coloring books.  That should tell you all you need to know.

The SEASR workshop began with some general introductions from NCSA,  Christopher Mackie from the Mellon Foundation, and Michael Welge from NCSA/SEASR Project.  A lot of people here came directly from the Bamboo workshop #3 in Tuscon (which is sort of cruel considering it is -25F with the windchill here).  I’m curious to hear about Bamboo - at first glance it seems that SEASR is an instatiation of some of Bamboo’s goals, but that may be a gross misrepresentation. About to see some examples of the SEASR software in action.  Raw notes from the introductions:

SEASR:
•    SEASR: Semantic Web driven SOA interoperability
•    Modular
•    Enable mashups
•    Rely heavily on RDF
•    Search & browse Fedora
•    Export from Zotero to Fedora
•    Simile Timeline interface

Model:
•    Meandre infrastructure (ZigZag scripting language)
•    Layered architecture with component repository service layer
•    Knowledge Discovery model: Data selection & cleaning -> Data prep (create an example) -> Transformation (munging) -> Data mining/pattern Discovery -> Interpretation/Knowledge.

Participant Project Plan Guide:
•    Research objective
•    Data sources
•    Transformations
•    Query/Descriptive/Analysis
•    Evaluation
•    Interaction
•    Outcome

Comments