January 30, 2009 at 10:22 am
· Filed under Uncategorized
It’s official: we (the Scholarly Technology Group and Women Writers Project) are moving to the Brown University Library. Having come from the library world, I’m pretty excited to be returning as part of a group that explores areas of digital scholarship critical to the future of libraries. More than that, we’ll have the opportunity to work even more closely with the wonderful and talented people at Brown’s Center for Digital Initiatives. All changes can be stressful, but we have a lot of reason to be excited about this move, and we’re looking forward to it.
Permalink
January 16, 2009 at 4:58 pm
· Filed under Uncategorized
Day 2 at the SEASR Workshop focused more on potential implementations of SEASR tools in existing projects (e.g., MONK, DISCUS, Vue, etc.). Most of the presentations were short on particulars, but talked a lot about expanding on current & planned functionality with SEASR tools. To SEASR’s credit, none of the speculative talk seemed far-fetched - a lot of big ideas suddenly seem much closer to realization. But, the strength of SEASR lies in the ability to share code (’components’, in SEASR lingo), and the ability to share is contingent on the emergence of a lively community around these tools. The SEASR folks reported on a Community hub initiative, though it wasn’t ready.
There has been a lot of discussion among the participants about whether SEASR does anything that other pipeline languages/java/frameworks don’t already do. And I think there is more to hear from the SEASR folks regarding why they developed yet another framework and language to support the real meat of the project: the repository of shared DH-centric code. Some participants voiced concerns and/or hopes that Mellon will mutate SEASR into the as-of-yet unrealized software layer for the Bamboo Project. Whether or not this happens, there is a solid sandbox here, and if the SEASR folks maintain some momentum with more forthcoming workshops, actively making their team available to practitioners, and making their support for an active user community paramount, there is a ton of potential here.
Permalink
January 15, 2009 at 5:43 pm
· Filed under Uncategorized
John Unsworth just mentioned that the Google Books data will likely be housed at UIUC as part of the HathiTrust. Since Google settled their lawsuit with publishers, they are required to provide computational access to the material. He sees SEASR as an infrastructure for beginning that analysis. If it turns out to be as accessible as we hope, it could be an incredibly valuable resource.
Permalink
January 15, 2009 at 4:18 pm
· Filed under Uncategorized
One of the most interesting examples today was the textual analysis exercise using UIMA and SEASR. Mike Haberman presented some tools for doing analyzes & visualizations of sentiment in a book, using thesauri and a controlled vocabulary of emotions. It isn’t clear where UIMA ends and SEASR begins. My first question was, what is SEASR doing that one couldn’t - or wouldn’t want to - do with existing tools? Since then, I’ve had some time to play with the Meandre Workbench, which is the software that drives SEASR tools. It seems obvious now: the issue isn’t what SEASR tools can do, it is how readily the activities of scholarship can be shared and reused in other contexts. Raw notes from the UIMA portion:
Examples: SEASR and UIMA:
UIMA: Unstructured Information Management Applications
Example: Use UIMA to analyze Part-of-speech information.
• Uses CAS (Common Analysis Structure) to serialize data and pass it from analytical structure to analytical structure. Creates a chain of analytical components.
• UIMA is accessed through Eclipse as a plugin:
o XML description of the UIMA chain that will be run.
o Choose document analyzer
o Choose directory of files to analyze.
• Result is original datasource (i.e. text) + annotated parameters (parts of speech/terms/names, etc.)
•
SEASR application:
• Pattern analysis – discover how characters in a novel are mentioned in relation to commonly occurring nouns.
• Problem: handling sparse data sets – many possible nouns, very few in a given sentence.
• Result: create a matrix of characters and nouns, with confidence ranking.
Example: Use UIMA to analyze sentiment information.
• Look at adjectives within body of text.
• Use a thesaurus to chart path from an adjective to one of a predetermined set of emotional terms.
• Rank the closeness (number of steps) of adjectives to a given emotional term.
• Problems: different thesauri result in very different analyses.
• SEASR visualization based on open source ActionScript library: Flare.
Permalink
January 15, 2009 at 12:35 pm
· Filed under Uncategorized
The first example was using SEASR to analyze Zotero collection metadata. SEASR is accessible through a Firefox plugin in the contextual menu. This was just a proof-of-concept, and maybe not the most compelling example, but since Zotero is so popular it made sense. Raw notes:
Integrating SEASR analytics with common scholarly tools
Examples: Zotero and SEASR:
• Send collections to SEASR and get some analytics.
• Examples: Authorship analysis
o Author centrality analysis
o Author Degree analysis
o Author HITS analysis
• Able to point the Zotero/SEASR plugin to your institutional SEASR services to get the kinds of analytics you want.
• This is purely a service – no indexing to reuse previously analyzed data.
Example 1:
• SEASR will only do the analysis. Analytic results will be stored in the Zotero database as an attachment.
• SEASR plug-in in context menu – Option to compute Flesch-Kincaid readability on a Zotero entry via Project Gutenberg.
Example 2:
• Look for author centrality in a collection of articles saved in Zotero. Returns centrality ranking and saves analysis as a Zotero attachment.
Example 3:
• Export collections from Zotero to Fedora via SEASR. (Is Fedora a useful way to share personal collections?).
Permalink
January 15, 2009 at 11:34 am
· Filed under Uncategorized
The NCSA has Supercomputing coloring books. That should tell you all you need to know.
The SEASR workshop began with some general introductions from NCSA, Christopher Mackie from the Mellon Foundation, and Michael Welge from NCSA/SEASR Project. A lot of people here came directly from the Bamboo workshop #3 in Tuscon (which is sort of cruel considering it is -25F with the windchill here). I’m curious to hear about Bamboo - at first glance it seems that SEASR is an instatiation of some of Bamboo’s goals, but that may be a gross misrepresentation. About to see some examples of the SEASR software in action. Raw notes from the introductions:
SEASR:
• SEASR: Semantic Web driven SOA interoperability
• Modular
• Enable mashups
• Rely heavily on RDF
• Search & browse Fedora
• Export from Zotero to Fedora
• Simile Timeline interface
Model:
• Meandre infrastructure (ZigZag scripting language)
• Layered architecture with component repository service layer
• Knowledge Discovery model: Data selection & cleaning -> Data prep (create an example) -> Transformation (munging) -> Data mining/pattern Discovery -> Interpretation/Knowledge.
Participant Project Plan Guide:
• Research objective
• Data sources
• Transformations
• Query/Descriptive/Analysis
• Evaluation
• Interaction
• Outcome
Permalink
November 26, 2008 at 10:57 am
· Filed under Uncategorized
A contributor on Code4Lib recently posted a request for folks to fill out his survey about the future of libraries. Without directly addressing the question about how libraries approach technologies going forward, Code4Lib-ers started an interesting, somewhat barbed discussion about MLS/MLIS degrees being required for library technologist positions. I realized after hitting the send button that my comments sounded a little like a dig at librarians. I didn’t intend it that way. I think that librarians have tough jobs and a lot of competeing demands, and the generally poor quality of many MLS programs don’t prepare students for the issues that they can and should be tackling in their libraries.
My initial email:
The discussion of the value MLIS/MLS is interesting, and familiar. It is a discussion that always seems to go in one direction: namely, why do library technologists need MLS degrees? There are some pretty compelling arguments that they don’t, but I’m curious what that means for librarians going forward.
I went to library school during what I consider to be the Great Delusion of the Late Nineties. There was a palpable sense among MLS students and librarians that we were about to find our groove in the proto-Google web world. My intro MLS courses were chock full of readings about librarians being hired away by Fortune 500 companies to help them make sense of Information, and about these mystical skills that librarians possessed that allowed us some insight into Information that others could not possess without an MLS.
What happened, of course, was that things changed quicker than MLS programs could adapt, and whether we liked it or not, our culture had moved beyond the need for librarians as gatekeepers. In the meantime, these amazing things are happening with open repositories, web services, and resource-oriented systems - things that should be front-and-center for emerging librarians, but often are skimmed because of the technical knowledge required. The result is that a lot of smaller academic libraries need to choose between enacting a really ambitious and forward-looking technology strategy, and protecting their MLS faculty lines. It seems like a doomed strategy in the long-run, but for a library director, I don’t think there is an easy answer. So a lot of places try to have it both ways and fish for skilled technologists with MLS degrees.
In my case, I went the other direction, currently working in a non-Library (but closely affiliated) technology group that is under the IT umbrella, despite having an MLS. So go figure…
Permalink
November 19, 2008 at 8:46 am
· Filed under Uncategorized
I’m finally getting a chance to comment on last week’s DLF Fall Forum. Since I’m not technically a librarian any more, I probably wouldn’t have gone. But seeing as it was in Providence, it was a great excuse.
The most compelling new thing that I saw there was the Djatoka (not Djakota, as Birkin Diana pointed out) JPEG 2000 Image server. Since I’m not really an image person, my simplistic impression of JPEG 2000 is that it is full of potential as a high-quality and scalable image format, but that it has lacked accessible and affordable software support. The folks at Los Alamos have now released Djatoka, which seems to be … pardon me here … a game changer.
In practice, Djatoka shares some features with Google Maps - images can be delivered as AJAXy tilesets, which can be dynamically loaded in the browser as requested by the user. But the really cool feature is URI addressibility of any region of an image. So, you want to study Mona Lisa’s nosehairs? Here is a URL. Not only that, but the API lets you pass a URI for any image, which the Djatoka server compresses on-the-fly, and then delivers in JPEG2000 format.
One of the controversial (in a nerdy way) features of Djatoka is its heavy use of OpenURL to reference & deliver parts of the image. There has been some discussion on Code4Lib about whether there is a better way to do it. I’d say there probably are better ways, but OpenURL is a way to get the server out there quickly and get people using it quickly. Pretty much any transfer format would get somebody’s hackles up, so you might as well build some momentum early by using a nearly-universal format (nearly universal for academic institutions that is, the rest of you are on your own). And hey, it is open source, so if you don’t like OpenURL, hack away.
Permalink
March 5, 2008 at 10:32 am
· Filed under libraries, technology
This past week, a bunch of smart folks came out with a preliminary specification for integrating diverse scholarly digital objects across repositories. Check out the announcement here.
The Object Reuse and Exchange (ORE) spec is, in my opinion, an enormous development. Scholarly technologists, librarians, and researchers have been circling around this idea of a truly semantic, services-based environment for a long time. It is great to see an architectural model that people can begin to discuss, rather than seeing more ad hoc development and tepid experiments by technology vendors.
The ORE working group describes their results like this:
“ORE will develop specifications that allow distributed repositories to exchange information about their constituent digital objects. These specifications will include approaches for representing digital objects and repository services that facilitate access and ingest of these representations. The specifications will enable a new generation of cross-repository services that leverage the intrinsic value of digital objects beyond the borders of hosting repositories. “
The also recommend an Atom-based model for packaging and delivering these representations of digital objects via syndication:
“These specifications describe a data model to identify and describe aggregations of web resources, and the encoding of the data model in the XML-based Atom syndication format. ”
Incidentally, my forthcoming article, “Syndicating Rich Bibliographic Metadata Using MODS and RSS”, Journal of Web Librarianship, Vol. 2 Issue 1, 2008, explores some very similar ideas, but as a proof-of-concept exercise applied to objects in library collections. Either way, it is really exciting to see the same vein of investigation happening at a much more prominent level.
It is pretty clear to most everyone that the old model of digital repositories - silos of data waiting for serendipitous discovery - is played out. Dan Cohen says it much more eloquently than I can, but suffice it to say that technology is just beginning to allow digital scholarship to more closely model the actual process of scholarship, in all of its complexity, nuance, and imprecision. It is pretty awesome.
Permalink
January 10, 2008 at 12:24 pm
· Filed under libraries, obsolescence, technology
Dealing with ILS (Integrated Library Systems) is probably the most drab and tedious part of my job, and, unfortunately fairly integral (ha ha). I find discussions about ILS issues to be generally uninteresting, hence a lengthy blog post about them.
Marshall Breeding, of http://www.librarytechnology.org, just posted the results of a survey he conducted about the level of satisfaction among ILS customers in regard to their respective systems. The report is available at http://www.librarytechnology.org/perceptions2007.pl. The most sad/interesting thing from my perspective is that Voyager, which is the system I work with, is waaaaaaaay down at the bottom of the list - as is its sister product, ALEPH. It’s no surprise to me - Voyager is poorly designed, poorly supported, and generally crappy product. What is interesting too is that the most enthusiastic supporters of open-source ILS projects are those from libraries running these crappy systems.
Part of the problem, in my humble, is that librarians still have a very consumerist attitude when it comes to their technology. The catalog and the technologies that support it, are products that you buy and then you make an effort to live with them. Ten years later, you repeat the process. In my mind, this mentality is akin to our culture’s adherence to the gas-combustion engine and coal-derived electric prower for most of our infrastructure needs. In the days of nanotechnology, ultra-efficient electronics, and ubiquitous computing, it is absurd that we cling to century-old technologies for our most fundamental needs. But we do. It’s also absurd that libraries - institutions that should be much more agile - still cling to this notion that their core technologies should take the form of large, unwieldy, local databases provided at enormous expense by private companies who really have no financial interest in improving their ILS products for more than half of their life-cycle. Once a product is 5 years old, the number of new customers dwindles and cash flow becomes scarce until the next generation comes out 5 years later. I would be very nice if libraries, collectively, put an end to this industry for good, and embraced systems that could be developed continuously, for the common good.
I’m a bit of a hypocrite, because I don’t know if I’d be able to sell that idea to our administration when the time comes to ditch Voyager, but it’s something to shoot for I guess…
Permalink