Yuen-Hsien Tseng

新一代資訊檢索技術在圖書館OPAC系統的應用

序號 6
刊名 大學圖書館
年份 1997
出版月份 7月
卷期 Vol.1 No.3
作者 曾元顯
作者任職單位 輔仁大學圖書資訊學系副教授
摘要

圖書館的「線上公用目錄」(On-Line Public Access Catalog, OPAC)系統,是提供讀者查詢圖書館館藏資料的重要工具,也是早期資訊檢索技術主要的應用方向。本文即以輔仁大學圖書館為例,說明整合新一代資訊檢索技術於 OPAC 系統的實施現況。我們運用非營利機構可獲得的先進檢索引擎,輔以自行發展的關鍵詞擷取技術,完成具備重要性排序、近似字串、模糊搜尋、相關詞回饋、允許近似自然語言檢索字串的 OPAC 系統。初步的評估發現,這樣的系統對協助使用者簡化查詢條件、拓展檢索字彙、提昇檢索成效有相當的助益。

關鍵字 模糊搜尋相關詞回饋線上公用目錄資訊檢索近似自然語言
頁碼 82-93
全文 全文下載 (94)
DOI
Review
Title Advanced Information Retrieval Technology for Online Public Access Catalog System
Author Yuen-Hsien Tseng
Author's title Associate Professor, Department of Library & Information Science, Fu Jen Catholic University
Abstract

The On-Line Public Access Catalog (OPAC) system is an important tool in helping users to retrieve information from a large volume of data stored in a library. However, recent advances in information retrieval technologies have not yet been applied to most OPAC systems. This paper introduces the development of a novel OPAC system which applies a free advanced search engine available from the Internet and an automatic keyword extraction function developed by the Department of Library and Information Science. The resultant features of the OPAC system include relevance ranking, fuzzy search, relevance feedback, and query by quasi-natural language. Results show that such a system greatly simplifies query formulations and automatically prompts more relevant search vocabulary to the users, thus improving the performance of an OPAC system to a higher degree.

Keywords Fuzzy searchInformation retrievalNatural languageOPACRelevance feedback
fulltext 全文下載 (94)
DOI

架構在WWW與Z39.50上的近似自然語言OPAC 檢索系統

序號 9
刊名 大學圖書館
年份 1998
出版月份 10月
卷期 Vol.2 No.4
作者 曾元顯
作者任職單位 輔仁大學圖書資訊學系副教授
摘要

近幾年圖書館的OPAC系統服務建設,雖可讓讀者時空無礙的連線查詢,但不同的圖書館採用的系統並不一致,使用者必須面對不同的操作畫面與使用上的些許差異,而查詢時也必須一一連線才得以進行。為改善此種缺點,本文透過系統實作方式,探討在WWW架構下,利用免費的軟體開發符合Z39.50資訊檢索標準的OPAC系統,並使其具備近似自然語言查詢功能的過程,並說明其中可能遭遇的困難或隱含的問題,以作為其他單位的參考。希望本文的內容,對推廣Z39.50資訊檢索標準的運用與服務,有所助益。

關鍵字 Z39.50資訊檢索標準全球資訊網多資料庫查詢線上公用目錄系統近似自然語言檢索
頁碼 128-148
全文 全文下載 (35)
DOI
Review
Title A Quasi-Natural Language Searchable OPAC System Based on WWW Technology and Z39.50 Protocol
Author Yuen-hsien Tseng
Author's title Associate Professor, Dept. of Library and Information Science, Fu Jen Catholic University
Abstract

Recent development in OPAC systems have made search and retrieval of library catalogs anywhere, anytime possible. However, due to the discrepancies among different systems, users have to face different interfaces and usage. Furthermore, searching among several OPAC systems requires serial connection to one after another. To overcome this problem, this article discusses an implementation of an OPAC system based on the WWW and Z39.50 protocols. The implemented OPAC system allow multi-database search and natural language-like queries, thus reducing the tedious work and difficulties in query formulation. Possible problems encountered in developing such a system are discussed. It is hoped that the implementation experiences will help other libraries in establishing a similar Z39.50 service.

Keywords Multi-database searchOn-line public access catalog systemQuasi-natural language retrievalWorld Wide WebZ39.50 information retrieval standard
fulltext 全文下載 (35)
DOI

分類不一致對文件自動分類效果之影響

序號 1
刊名 大學圖書館
年份 2005
出版月份 3月
卷期 Vol.9 No.1
作者 曾元顯
作者任職單位 輔仁大學圖書資訊系
摘要

本文探討分類不一致對自動分類成效的影響。經由近似文件的自動偵測,以及兩種分類方法針對兩個測試文件集做的比較實驗,本文發現:訓練資料的分類不一致性,即便高達34%,幾乎也不會影響分類器的成效。此項發現,其重要的意涵是,即便過去的研究使用了一致性不高的測試集做實驗,其結論仍舊是有效的。當然,分類不一致性高的資料,拿來訓練後,不管分類器好壞,其得到的分類成效都是比較低的。除了以上發現外,本文也介紹了一套中文分類測試集,免費提供各界研究使用。另外,作者也提出了一套偵測複本或相似文件的可靠方法,與過去的方法比較,此方法可以偵測過去方法所無法偵測到的相似文件。

關鍵字 一致性主題分析分類測試集文件分類複本偵測
頁碼 2-19
全文 全文下載 (68)
DOI
Review
Title The Effect of Inconsistency in Training Data on Automatic Text Categorization
Author Yuen-Hsien Tseng
Author's title Department of Library and Information Science, Fu-Jen University
Abstract

This article discusses the effect of inconsistency in training data on the performance of text classifiers. Our experiments show that the inconsistency, even reaching a level as high as 34%,hardly affects the effectiveness of he classifiers.Better classifiers perform better independent of duplicates and label inconsistency.The implication is that past experiments(especially on the Reuters-21578 collection) remain valid. In the experiment process,the author proposes a duplicate detection technique that is far more effective than previous ones.A new Chinese test collection for text categorization is also introduced for deneral free download.

Keywords ConsistencyDocument classificationDuplicate detectionSubject analysisTest collection for categorization
fulltext 全文下載 (68)
DOI
訂閱文章