Fixed the problem

From: Abhishek J Ratani (ARatani@odu.edu)
Date: Fri Feb 11 2005 - 16:47:45 EST


Dear Dragomir,

We worked with the files and fixed some of the problems.

These are the changes which we made which fixes some problems and improves
the summary.

These are the changes to the MEAD_ADDONS_UTIL.pm.

1. We added a function called removejavascript since many of the summaries
that were returned had javascript in them, and the method only removed the
javascript from the <head> tag, and not the <body>

sub removejavascript {

 my $html = shift;

 $html =~s/<script.*?<\/script>//gi;

 return $html;
}

2. We changed the sanitize sub to only allow a proper html entity. So, &&
will be changed to &amp;&amp; This is our change,

$html =~ s/&(?!(#(([xX][0-9a-fA-F]+)|([0-9]+));))/&amp;/g;

3. We changed the extract_text_from_html sub to add this line,

$html = &removejavascript ($html);

I have attached the modified file. Please recommend any changes to this if
you can.

As you might know, we are one of eight research centers of the college of
engineering of ODU. We are developing a prototype integrated Q/A search
system. It is a research prototype.

Right now, we are in the process of adding a summarization module using
MEAD to our Intelligent Question/Answering Search System. We will give full
acknowledgement to MEAD on every summary that been produced.

I hope that helps.

Thank you

Abhishek (AJ) Ratani
Information Technology Specialist
Center for Advanced Engineering Environments <http://www.aee.odu.edu>
Old Dominion University

Phone: (757) 766-5248
Fax: (757) 766-5246
 Email: aratani@odu.edu
Webpage: http://www.cise.ufl.edu/~aratani



This archive was generated by hypermail 2b30 : Tue Jun 09 2009 - 05:00:07 EDT