Residential Collegefalse
Status已發表Published
Extracting company information from the web
Man I Lam; Zhiguo Gong; Jingzhi Guo
2009-12-04
Conference NameIEEE International Conference on Systems, Man and Cybernetics
Source Publication2009 IEEE International Conference on Systems, Man and Cybernetics
Pages3640-3645
Conference Date11-14 Oct. 2009
Conference PlaceSan Antonio, TX, USA
Abstract

As World Wide Web is becoming the most important information repository, increasing amount of information is available. Currently, web search engines can only provide document oriented searches. In order to fully make use of information from the web, some effective and efficient extraction algorithms are definitely desirable. In this paper, some existing achievements are investigated firstly. Then our current technique on web information extraction is discussed in detail. In our approach, rules and patterns are extracted from sample pages through training process, with human involvements. We use both keywords and regular expressions to represent rules and patterns in our system. The keywords work as anchors to locate the positions of the potential information and regular expressions work as validations of the values. In our system, all the extracted information is represented in XML forma

DOI10.1109/ICSMC.2009.5346863
URLView the original
Indexed ByCPCI-S
Language英語English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Cybernetics ; Computer Science, Information Systems
WOS IDWOS:000279574602024
Scopus ID2-s2.0-74849128289
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
AffiliationFaculty of Science and Technology, University of Macau, Macao, PRC
First Author AffilicationFaculty of Science and Technology
Recommended Citation
GB/T 7714
Man I Lam,Zhiguo Gong,Jingzhi Guo. Extracting company information from the web[C], 2009, 3640-3645.
APA Man I Lam., Zhiguo Gong., & Jingzhi Guo (2009). Extracting company information from the web. 2009 IEEE International Conference on Systems, Man and Cybernetics, 3640-3645.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Man I Lam]'s Articles
[Zhiguo Gong]'s Articles
[Jingzhi Guo]'s Articles
Baidu academic
Similar articles in Baidu academic
[Man I Lam]'s Articles
[Zhiguo Gong]'s Articles
[Jingzhi Guo]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Man I Lam]'s Articles
[Zhiguo Gong]'s Articles
[Jingzhi Guo]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.