(no subject)

From: Ian Benest (idb@cs.york.ac.uk)
Date: Tue Oct 02 2001 - 11:44:25 EDT


Subject: Summarization

Dear Dr Radev

I met Bill Byrne at Eurospeech in Denmark in September and we
got to talking through a problem that I have. I got in touch
with him afterwards and he suggested that I contact you.

I'm not sure just how much I need to say to get your interest,
so here goes. I have devised this idea of on-line lectures
which are electronic slides with a voice-over synchronised with
any necessary animation. So each slide has a number of speech
fragments and each fragment has its transcription. I have a whole
lecture module done in this way and retain most of the semantics
including things like the semantics of the diagrams.

I want to investigate the user-interface to two future tools
and I'm not sure just what they're going to "look-and-feel"
like - I suppose that's all part of the research. One tool
will take the author of a lecture through what they have
produced, critically assessing as it goes. The other will
help people in a search query to find the slides that best
match their query. In both cases speech will be used to
augment the graphical output. I am already quite advanced
as far as the evaluation of the lectures and the search
engine is concerned.

In both cases I want the system to seem as though it knows
what the slide/lecture is about. I have used the Unix utility
"style" to extract nouns from the transcripts. The utility
doesn't do a good job so there is a fairly big stop list after
running style. I hope to have access to some software that is
supposed to get all the nouns with no errors and should be
moving to use that software soon.

At the moment the summarization amounts to: "this slide is about
..." followed by a list of nouns (perhaps with supporting
adjectives). When you see the slide in action and read this
"synopsis" you think "that's about right". When you read the
"synopsis" without seeing the slide, I can't imagine what the
slide is all about. Clearly I am losing vital information and
that's where summarization comes in. I can, and do, extract slide
titles which might be more enlightening and I have access to the
text on the slide.

The scripts for each slide are quite short and I probably want
one or two sentences only. Any more and I'll bore the listener
to death. I also expect that the summary should be less than
fluent and more spontaneous.

So what I'm after is software on Linux that can summarize small blocks
of text to see if this solves my problem. I'm not sure what I can
give in return. I think it unlikely that I would be able to help with
summarization development, though the particular niche of possibly
requiring more spontaneous speech might have some novelty.

Can you help?

Many thanks,
Ian Benest



This archive was generated by hypermail 2b30 : Tue Jun 09 2009 - 05:00:04 EDT