July 26, 2006

The Orange County Register

UCI computer researchers test new technology

By Marla Jo Fisher

"Text mining" will allow analysis of huge volumes of documents

IRVINE - UC Irvine researchers have used new computer technology to sort topics in 330,000 stories published by The New York Times in only a few hours, proving that "text mining" technology can be used on huge document collections, officials announced today.

The tool is expected to help medical researchers, lawyers and others who wade through reams of documents looking for connections.

The analysis of newspaper stories published between 2000 and 2002 would have taken a team of librarians months to complete, UCI said.

Researchers identified topics in the news and charted their course over time, determining the months in which they were most popular.

In one example, they generated the words "rider," "bike," "race," "Lance Armstrong" and "Jan Ullrich" to determine that during coverage of the Tour de France bicycle race, Lance Armstrong was written about seven times more often than Ullrich.

They also found that coverage of the Tour de France peaked in the summer months and decreased year to year.

That information could help advertisers who want to pinpoint peak interest in an event, for example.

The team used a text model developed at UC Berkeley in 2003 in its analysis. Computer modeling of topics seeks words that occur together to categorize them.

Findings were presented recently at the Intelligence and Security Informatics Conference in San Diego.

"We have shown in a very practical way how a new text-mining technique makes understanding huge volumes of text quicker and easier," David Newman, a computer scientist at UCI's Bren School of Information and Computer Sciences, said in the statement announcing the research's conclusions. "To put it simply, text mining has made an evolutionary jump. In just a few short years, it could become a common and useful tool for everyone from medical doctors to advertisers; publishers to politicians."
UCI computer researchers test new technology