Residential College | false |
Status | 已發表Published |
Extracting company information from the web | |
Man I Lam; Zhiguo Gong; Jingzhi Guo | |
2009-12-04 | |
Conference Name | IEEE International Conference on Systems, Man and Cybernetics |
Source Publication | 2009 IEEE International Conference on Systems, Man and Cybernetics |
Pages | 3640-3645 |
Conference Date | 11-14 Oct. 2009 |
Conference Place | San Antonio, TX, USA |
Abstract | As World Wide Web is becoming the most important information repository, increasing amount of information is available. Currently, web search engines can only provide document oriented searches. In order to fully make use of information from the web, some effective and efficient extraction algorithms are definitely desirable. In this paper, some existing achievements are investigated firstly. Then our current technique on web information extraction is discussed in detail. In our approach, rules and patterns are extracted from sample pages through training process, with human involvements. We use both keywords and regular expressions to represent rules and patterns in our system. The keywords work as anchors to locate the positions of the potential information and regular expressions work as validations of the values. In our system, all the extracted information is represented in XML forma |
DOI | 10.1109/ICSMC.2009.5346863 |
URL | View the original |
Indexed By | CPCI-S |
Language | 英語English |
WOS Research Area | Computer Science |
WOS Subject | Computer Science, Cybernetics ; Computer Science, Information Systems |
WOS ID | WOS:000279574602024 |
Scopus ID | 2-s2.0-74849128289 |
Fulltext Access | |
Citation statistics | |
Document Type | Conference paper |
Collection | DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE |
Affiliation | Faculty of Science and Technology, University of Macau, Macao, PRC |
First Author Affilication | Faculty of Science and Technology |
Recommended Citation GB/T 7714 | Man I Lam,Zhiguo Gong,Jingzhi Guo. Extracting company information from the web[C], 2009, 3640-3645. |
APA | Man I Lam., Zhiguo Gong., & Jingzhi Guo (2009). Extracting company information from the web. 2009 IEEE International Conference on Systems, Man and Cybernetics, 3640-3645. |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment