This DATSETNAMEreadme.txt file was generated on 20211217 by Laura Medlock ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset - GCHQ Gendered Narratives Project, Case 1 2. Author Information Principal Investigator Contact Information Name: Harmonie Toros Institution: iCSS, University of Kent Address: Institute of Cyber Security for Society (iCSS) & School of Politics and International Relations, University of Kent, Keynes Building, Canterbury, Kent, CT2 7NP, UK Email: h.toros@kent.ac.uk Associate or Co-investigator Contact Information Name: Caio Ribeiro Institution: School of Computing, University of Kent Address: Email: Alternate Contact Information Name: Laura Medlock Institution: University of Kent Address: Institute of Cyber Security for Society (iCSS) & School of Politics and International Relations, University of Kent, Keynes Building, Canterbury, Kent, CT2 7NP, UK Email: l.medlock@kent.ac.uk 3. Date of data collection (single date, range, approximate date) December 2020 to February 2021 4. Geographic location of data collection (where was data collected?): Kent, UK 5. Information about funding sources that supported the collection of the data: GCHQ Research Fellowship in National Resilience --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: 4chan dataset - case1 (Brenton Tarrant) - Part 1 of 4 Short description: B. Filename: 4chan dataset - case1 (Brenton Tarrant) - Part 2 of 4 Short description: C. Filename: 4chan dataset - case1 (Brenton Tarrant) - Part 3 of 4 Short description: D. Filename: 4chan dataset - case1 (Brenton Tarrant) - Part 4 of 4 Short description: E. Filename: DailyMail dataset - case1 (Brenton Tarrant) Short description: F. Filename: TheIndependent dataset - case1 (Brenton Tarrant) Short description: 2. Relationship between files: 3. Additional related data collected that was not included in the current data package: GCHQ Gendered Narratives Project, Case 2 4. Are there multiple versions of the dataset? no If yes, list versions: Name of file that was updated: i. Why was the file updated? ii. When was the file updated? Name of file that was updated: i. Why was the file updated? ii. When was the file updated? -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: The Independent and DailyMail datasets: Each dataset contains the comments posted on manually selected articles related to each of the two case studies, preserving the tree-like structure of the comments (through the ReplyTo and OPNumber attributes). The articles were selected from search results on Google using search phrases such as: Case 1 search string example “site:dailymail.co.uk after:2019/02/01 before:2020/08/30 brenton tarrant christchurch” Case 2 search “site:independent.co.uk after:2019/02/01 before:2020/08/30 shamima begum” The selected articles had their comments extracted using the requests python library, and the datasets were preprocessed and saved as csv files. Each line of these files refers to a text comment, with metadata contextualizing it: the timestamp, title and id of the article it was posted on, the username provided by the poster (anonymised as a numeric ID), and information on the structure of the conversation (what comment it is replying to, and how many replies did that comment get). 4chan datasets: Each dataset contains every post in threads that were considered relevant for the case study, based on keyword search. To get the datasets for each case study from the unfiltered 4plebs data archive (online archive for the politics board of 4chan, /pol/) from the relevant timeline, we performed a simple keyword search over the texts in all comments. If any comment posted on a thread had any of target keywords or key phrases, the entire thread was kept in the dataset. Otherwise, the whole tread was removed, as it was deemed irrelevant for that case study. The keyword search for Case 1 had the following strings: “Brenton“, “Tarrant“, and “Christchurch Mosque“. For Case 2, those were “Shamima“, “Begum” and any combination of the words “Jihadi, Beheader, Isis” and “wife, wives, bride, brides” (as these terms were used to refer to the case in the forums). The Case 1 dataset was too big to be kept in a single csv file, so it was divided into 4 consecutive parts. 2. Methods for processing the data: 3. Instrument- or software-specific information needed to interpret the data: Microsoft Excel, or similar 4. Standards and calibration information, if appropriate: .csv files 5. Environmental/experimental conditions: 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: [FILENAME] ----------------------------------------- 1. Number of variables: 2. Number of cases/rows: 3. Variable List A. Name: [variable name] Description: [description of the variable] Value labels if appropriate B. Name: [variable name] Description: [description of the variable] Value labels if appropriate 4. Missing data codes: Code/symbol Definition Code/symbol Definition 5. Specialized formats of other abbreviations used