As any Internet surfer knows, a keyword search via Google or Yahoo doesn’t always yield the desired results. A new book co-authored by a University of Missouri-Rolla computer scientist outlines a method of dealing with Internet data that would improve and simplify web searches.
Web Data Management: A Warehouse Approach, co-written by Dr. Sanjay Madria, an assistant professor of computer science at UMR, will be published in mid-October by the scientific publishing company Springer as part of its Professional Computing series. It is the first book to focus on the warehouse approach to web data management.
Due to its irregular nature and lack of a pre-defined structure, conducting an internet search can be difficult, Madria says. "Internet users must rely on search engines like Google or Yahoo, which retrieve data based on a keyword search," says Madria. "This data is often unreliable because the mere presence of a keyword on a webpage does not mean that page belongs to the domain of interest," Madria explains.
These search engines are centralized repositories of data and leave the user no means to manage the data as they wish.
By using the warehouse technology detailed in Madria’s book, however, a user can perform a web search with specified on hyperlinks or page content, then store that data in a local warehouse. "A user can then use this warehouse data for future queries with no need for another Internet search," Madria says. "Such a system is very useful for individual users and companies, who often search for data in a related domain."
The book also discusses storing data, and maintaining the warehouse data to deal with webpage updates.
Madria co-wrote the book with Dr. Sourav Bhowmick and Dr. Wee Keong Ng, two members of the information systems faculty at Nanyang Technological University in Singapore. It is the result of Madria’s research over the past five years.
Madria has taught at UMR since 2000. He specializes in web and mobile data management and teaches courses on database systems, web data management and XML, and mobile computing.