Text Mining Search Engine by Patrick Herron

School of Library and Information Science, The University of North Carolina, Chapel Hill, North Carolina

Master's candidate in Information Science, M.S.I.S. with a Computer Science Minor and Bioinformatics Certificate anticipated August 2006

As of November 2006 I have completed work on my master's thesis on text mining adoption and innovation at a large pharmaceutical company (full text). The thesis has three main components: (a) a theoretical treatment of text mining; (b) a review of business and scientific applications of text mining; and (c) a case study of text mining adoption for pharmacogenomics (PGx) drug discovery. In the thesis I have developed a quality model for evaluating novel drug discovery information generated (rather than merely extracted) from multiple literature and data inputs. In the thesis I have also provided a new way of defining text mining as distinct from data mining, information retrieval, and information extraction. My thesis advisor is Dr. Stephanie Haas.

From 2004-2006 I experimented with different concept-based feature representations for automatic text classification and clustering tasks using the NC Health Info community health website collection as a corpus. The goal of the experiments was to automate the generation of both index & topic term sets for information architecture and cataloging tasks. NC Health Info is a joint project of SILS and UNC Health Sciences Library funded by a grant from the National Library of Medicine.