Residential College | false |
Status | 已發表Published |
Trails of Data: Three Cases for Collecting Web Information for Social Science Research | |
Li,Fumin; Zhou,Yisu; Cai,Tianji | |
2021-11 | |
Source Publication | Social Science Computer Review |
ISSN | 0894-4393 |
Volume | 39Issue:5Pages:922–942 |
Abstract | As the availability of online data grows rapidly, researchers are confronted with a pressing question: How should social scientists collect Internet data for research? This study focuses on one of the most commonly used data collection techniques: web scraping. Going beyond canned approaches by leveraging a general framework of data communication, this study illustrates how online information can be systematically queried and fetched for reproducible research. To generalize our approaches, we additionally explore the variations in site security and architecture that analysts may encounter during the scraping process before they are given access to the desired data. The approaches we introduce do not rely on any proprietary software and can be easily implemented on any computing platform with programming languages such as Python or R. The methodological discussion in this study is meant to be applicable to current web-based research efforts. We include three examples with complete Python implementation. We also present an integrated workflow that enables researchers to produce analytical data sets that are traceable and thus verifiable for analysis or replication. Lastly, options related to the validity and efficiency of data are discussed, and we highlight the ongoing debate surrounding the ethics of online data collection, ultimately advocating for the fair use of online data. |
Keyword | Data Collection Reproducible Research Web Scraping Headless Browser Apis Python |
DOI | 10.1177/0894439319886019 |
URL | View the original |
Indexed By | SCIE ; SSCI |
Language | 英語English |
WOS Research Area | Computer Science ; Information Science & Library Science ; Social Sciences - Other Topics |
WOS Subject | Computer Science, Interdisciplinary Applications ; Information Science & Library Science ; Social Sciences, Interdisciplinary |
WOS ID | WOS:000496062500001 |
Publisher | SAGE PUBLICATIONS INC |
Scopus ID | 2-s2.0-85075010416 |
Fulltext Access | |
Citation statistics | |
Document Type | Journal article |
Collection | DEPARTMENT OF ECONOMICS |
Corresponding Author | Cai,Tianji |
Affiliation | Department of Sociology, University of Macau Taipa, Macau SAR, China |
First Author Affilication | University of Macau |
Corresponding Author Affilication | University of Macau |
Recommended Citation GB/T 7714 | Li,Fumin,Zhou,Yisu,Cai,Tianji. Trails of Data: Three Cases for Collecting Web Information for Social Science Research[J]. Social Science Computer Review, 2021, 39(5), 922–942. |
APA | Li,Fumin., Zhou,Yisu., & Cai,Tianji (2021). Trails of Data: Three Cases for Collecting Web Information for Social Science Research. Social Science Computer Review, 39(5), 922–942. |
MLA | Li,Fumin,et al."Trails of Data: Three Cases for Collecting Web Information for Social Science Research".Social Science Computer Review 39.5(2021):922–942. |
Files in This Item: | Download All | |||||
File Name/Size | Publications | Version | Access | License | ||
Li, Zhou, & Cai_2019(467KB) | 期刊论文 | 作者接受稿 | 开放获取 | CC BY-NC-SA | View Download |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment