This STRUCPRIME_README.txt file was generated on 2019/12/21 by Christina Kim.


-------------------
GENERAL INFORMATION
-------------------


1. Structural Convergence in Dialogues between Native and Non-native Speakers 


2. Author Information

  Principal Investigator Contact Information
        Name: Christina S. Kim
           Institution: University of Kent
           Address: English Language and Linguistics, Cornwallis North West, CT2 7NF
           Email: c.s.kim@kent.ac.uk

  Co-investigator Contact Information
        Name: Gloria Chamorro
           Institution: University of Kent
           Address: English Language and Linguistics, Cornwallis North West, CT2 7NF
           Email: g.chamorro@kent.ac.uk


3. Dates of data collection: 

2018/04/11 - 2019/07/03 


4. Geographic location of data collection: 

Canterbury, UK 


5. Information about funding sources that supported the collection of the data:

British Academy grant (SG143355) to CSK; School of European Culture and Languages research grant to CSK and GC


---------------------
DATA & FILE OVERVIEW
---------------------


1. File List

   A. Filename: SP1.txt       
      Short description: Tab-delimited text file. Coded responses of British English speakers describing illustrations depicting ditransitive events.      
        
   B. Filename: SP2.txt       
      Short description: Tab-delimited text file. Coded responses of British English speakers describing illustrations depicting ditransitive events.                

   C. Filename: SP2_posttest.txt       
      Short description: Tab-delimited text file. Demographic information of participants (anonymised) and responses to post-test questions collected following the test session that generated data in SP2.txt.
        
   D. Filename: SP3.txt       
      Short description: Tab-delimited text file. Coded responses of British English speakers describing illustrations depicting ditransitive events. Includes data from a pre-test survey where participants made judgments about various properties of two speakers based on recordings of their voices.        
        

2. Relationship between files: 

The files contain data collected for a series of behavioural experiments involving dialogue between native British English speaker participants and a confederate (either live or pre-recorded). Experiment 1 generated SP1.txt (pre-recorded). Experiment 2 generated SP2.txt and SP2_posttest.txt (live). Experiment 3 generated SP3.txt (live).


3. Additional related data collected that was not included in the current data package: 

Norming data was collected for the verbs used in Experiments 1-3, to determine the extent to which each verb was biased toward usage in the double object or the prepositional dative form. 


4. Are there multiple versions of the dataset? 

No


--------------------------
METHODOLOGICAL INFORMATION
--------------------------


1. Description of methods used for collection/generation of data: 
The methods are described in the publication based on this data: https://doi.org/10.1080/23273798.2021.1916544


2. Methods for processing the data: 

Raw data (audio recordings of test sessions) were transcribed by research assistants, and coded by research assistants or experimenters (see details below in Data-Specific Information). Data was anonymised, with individual participants identified by participant ID. 


3. Instrument- or software-specific information needed to interpret the data: 

n/a


4. Standards and calibration information, if appropriate:

n/a


5. Environmental/experimental conditions:

Sessions were run in behavioural testing rooms in the Linguistics Lab at the University of Kent (Cornwallis Central), on Dell PCs running PsychoPy software. For Experiment 1, participants listened to pre-recorded stimuli over headphones, and were recorded on a Zoom H4N recorder. For Experiments 2 and 3, participants communicated with a confederate in another testing room over headsets, and were recorded using the PC's voice recording software.


6. Describe any quality-assurance procedures performed on the data:

Participants who were determined not to be native British English speakers after they had participated were excluded from the data.


7. People involved with sample collection, processing, analysis and/or submission:

A total of six undergraduate research assistant assisted in collecting, transcribing, and/or coding the data across the three experiments. 

CSK and GC created the visual stimuli, oversaw processing and coding of the data, interpreted results, and drafted the manuscript for submission.

CSK carried out data analysis.


--------------------------------------
DATA-SPECIFIC INFORMATION FOR: SP1.txt
--------------------------------------

1. Number of variables:

15


2. Number of cases/rows: 

11213


3. Variable List

    A. Name: participantID
       Description: Integer from 1-40 identifying each participant.

    B. Name: listID
       Description: Integer from 1-4 identifying the list assigned to each participant. Lists had different pseudorandom orders for the stimuli but were otherwise identical to each other.

    C. Name: blockID
       Description: Integer from 1-2 identifying whether the data point is from block 1 or 2 of the experiment.

    D. Name: group
       Description: The name of the group the participant was assigned to. Possible values: Native_NoChange (block 1 speaker: native British English speaker, block 2 speaker: same as block 1), L2_NoChange (block 1 speaker: non-native speaker of English, block 2 speaker: same as block 1), L2_Change_SameLang (block 1 speaker: non-native speaker of English, block 2 speaker: a different non-native speaker of English), L2_Change_DiffLang (block 1 speaker: non-native speaker of English, block 2 speaker: native British English speaker)

    E. Name: speakerTypeL2
       Description: Codes whether the speaker in the pre-recorded stimulus was a non-native speaker (-1) or a native speaker (1).

    F. Name: speakerChange
       Description: Codes whether there is a speaker change from block 1 to block 2 (1), or no change (-1).

    G. Name: languageChange
       Description: Codes whether there is a change in the native language status of the speaker from block 1 to block 2 (1), or no change (-1).

    H. Name: verbOld
       Description: Codes whether the verb in that trial appeared in block 1 (1), or was new to block 2 (-1). All block 1 trials have value -1.

    I. Name: cycle
       Description: The cycle number in each block. An integer between 1-64 for block 1, and between 1-80 for block 2. A cycle consists of one participant turn and one recording turn.

    J. Name: turn
       Description: Indicates whether the participant described an illustration ("participant") or listened to a pre-recorded stimulus describing an illustration ("recording"). Participant turns alternated with recording turns.

    K. Name: itemID
       Description: Integer between 1-50 identifying the item (illustration, verb pair). Items 1-30 are test items involving ditransitive verbs. Items 31-50 are filler items involving intransitive or transitive verbs.

    L. Name: verbType
       Description: Indicates whether the verb was strongly biased toward the prepositional dative form in the norming study ("nonalternating"), weakly biased toward the prepositional dative form or biased toward the double object form ("alternating"), or a filler item ("filler").

    M. Name: verb
       Description: The verb used in that trial.

    N. Name: response
       Description: Codes whether the response produced by the participant was a double object sentence (1) or another form (0).

    O. Name: PD_advantage
       Description: A value between 0 and 1 indicating a verb's bias in favour of the prepositional dative form over the double object form; calculated for each verb by subtracting the proportion of double object descriptions in the norming study from the proportion of prepositional dative descriptions.


4. Missing data codes:

Trials with missing or incomplete data were excluded from the dataset.


5. Specialized formats of other abbreviations used

n/a


--------------------------------------
DATA-SPECIFIC INFORMATION FOR: SP2.txt
--------------------------------------

1. Number of variables:

15


2. Number of cases/rows: 

9984


3. Variable List

    A. Name: participantID
       Description: Integer from 1-48 identifying each participant.

    B. Name: listID
       Description: Integer from 1-3 identifying the list assigned to each participant. Lists had different pseudorandom orders for the stimuli but were otherwise identical to each other.

    C. Name: blockID
       Description: Integer from 1-2 identifying whether the data point is from block 1 or 2 of the experiment.

    D. Name: speakerType
       Description: Indicates the type of confederate assigned to the participant -- native British English speaker ("BrE"), native North American English speaker ("NorthAm"), native Spanish speaker ("Sp").

    E. Name: sameDialect
       Description: Codes whether the confederate spoke the same dialect of English as the participant (speakerType BrE = 1), or not (speakerType NorthAm or Sp = -1). 

    F. Name: sameL1
       Description: Codes whether the confederate had the same native language as the participant (speakerType BrE or NorthAm = 1), or not (speakerType Sp = -1). 

    G. Name: verbOld
       Description: Codes whether the verb in that trial appeared in block 1 (1), or was new to block 2 (-1). All block 1 trials have value -1.

    H. Name: cycle
       Description: The cycle number in each block. An integer between 1-64 for block 1, and between 1-80 for block 2. A cycle consists of one participant turn and one confederate turn.

    I. Name: turn
       Description: Indicates whether the participant described an illustration ("participant") or listened to the confederate describe an illustration ("confederate"). Participant turns alternated with confederate turns.

    J. Name: itemID
       Description: Integer between 1-50 identifying the item (illustration, verb pair). Items 1-30 are test items involving ditransitive verbs. Items 31-50 are filler items involving intransitive or transitive verbs.

    K. Name: verbType
       Description: Indicates whether the verb was strongly biased toward the prepositional dative form in the norming study ("nonalternating"), weakly biased toward the prepositional dative form or biased toward the double object form ("alternating"). Filler trials are excluded from the dataset.

    L. Name: verb
       Description: The verb used in that trial.

    M. Name: code
       Description: The response type produced by the participant -- "DO" indicates a double object form, "PD" indicates a prepositional dative form, and all other responses have value "x".

    N. Name: response
       Description: Codes whether the response produced by the participant was a double object sentence (1) or another form (0).

    O. Name: PD_advantage
       Description: A value between 0 and 1 indicating a verb's bias in favour of the prepositional dative form over the double object form; calculated for each verb by subtracting the proportion of double object descriptions in the norming study from the proportion of prepositional dative description.


4. Missing data codes:

Trials with missing or incomplete data were excluded from the dataset.


5. Specialized formats of other abbreviations used

n/a


-----------------------------------------------
DATA-SPECIFIC INFORMATION FOR: SP2_posttest.txt
-----------------------------------------------

1. Number of variables:

5


2. Number of cases/rows: 

94


3. Variable List
    
    A. Name: participantID
       Description: Integer from 1-48 identifying each participant.

    B. Name: speakerType
       Description: Indicates the type of confederate assigned to the participant -- native British English speaker ("BrE"), native North American English speaker ("NorthAm"), native Spanish speaker ("Sp").

    C. Name: Q1_wherefrom
       Description: Response to post-test question 1 ("Where do you think the speaker was from?") -- UK, US, Europe, Not sure.

    D. Name: Q2_comprehensibility
       Description: Response to post-test question 2 ("Did you find the speaker easy to understand?") -- "Very easy" (VeryEasy), "Somewhat easy" (Easy), "Somewhat hard" (Hard), "Very hard" (VeryHard).

    E. Name: Q3_usage
       Description: Response to post-test question 3 ("How would you describe how the speaker used language?") -- "Didn't know English well" (A), "Knew English well but said some things differently than I would" (B), "Spoke pretty similarly to me" (C).


4. Missing data codes:

n/a


5. Specialized formats of other abbreviations used

n/a


--------------------------------------
DATA-SPECIFIC INFORMATION FOR: SP3.txt
--------------------------------------

1. Number of variables:

26


2. Number of cases/rows: 

6030


3. Variable List

    A. Name: participantID
       Description: Integer from 1-29 identifying each participant.

    B. Name: listID
       Description: Integer from 1-6 identifying the list assigned to each participant. Lists 1, 3 and 5 had verbDistribution = 1; lists 2, 4 and 6 had verbDistribution = -1. Each set of lists had different pseudorandom orders for the stimuli but were otherwise identical to each other.

    C. Name: blockID
       Description: Integer from 1-2 identifying whether the data point is from block 1 or 2 of the experiment.

    D. Name: group
       Description: Indicates the group the participant was assigned to -- SE-FULL (speakerType = SE, verbDistribution = 1), SE-ALT (speakerType = SE, verbDistribution = -1), L-FULL (speakerType = L, verbDistribution = 1), L-ALT (speakerType = L, verbDistribution = -1).

    E. Name: speakerType
       Description: Codes the type of confederate assigned to the participant -- native British English speaker from the South-East (1), native British English speaker from Lancashire (-1).

    F. Name: verbDistribution
       Description: Codes whether the full set of verbs used in Experiments 1-2 were used (1), or whether only alternating verbs were used (-1). 

    G. Name: verbOld
       Description: Codes whether the verb in that trial appeared in block 1 (1), or was new to block 2 (-1). All block 1 trials have value -1.

    H. Name: cycle
       Description: The cycle number in each block. An integer between 1-64 for block 1, and between 1-80 for block 2. A cycle consists of one participant turn and one confederate turn.

    I. Name: turn
       Description: Indicates whether the participant described an illustration ("participant") or listened to the confederate describe an illustration ("confederate"). Participant turns alternated with confederate turns.

    J. Name: itemID
       Description: Integer between 1-50 identifying the item (illustration, verb pair). Items 1-30 are test items involving ditransitive verbs. Items 31-50 are filler items involving intransitive or transitive verbs.

    K. Name: verbType
       Description: Indicates whether the verb was strongly biased toward the prepositional dative form in the norming study ("nonalternating"), weakly biased toward the prepositional dative form or biased toward the double object form ("alternating"). Filler trials are excluded from the dataset.

    L. Name: verb
       Description: The verb used in that trial.

    M. Name: code
       Description: The response type produced by the participant -- "DO" indicates a double object form, "PD" indicates a prepositional dative form, and all other responses have value "x".

    N. Name: response
       Description: Codes whether the response produced by the participant was a double object sentence (1) or another form (0).

    O. Name: PD_advantage
       Description: A value between 0 and 1 indicating a verb's bias in favour of the prepositional dative form over the double object form; calculated for each verb by subtracting the proportion of double object descriptions in the norming study from the proportion of prepositional dative description.

    P. Name: mapDistance_SE
       Description: The participant's estimated distance between where they were from and where the South-Eastern speaker was from on a map of the UK (cm). From pre-test survey.

    Q. Name: mapDistance_L
       Description: The participant's estimated distance between where they were from and where the Lancashire speaker was from on a map of the UK (cm). From pre-test survey.

    R. Name: PC1
       Description: Principal component 1 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    S. Name: PC2
       Description: Principal component 2 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    T. Name: PC3
       Description: Principal component 3 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    U. Name: PC4
       Description: Principal component 4 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    V. Name: PC5
       Description: Principal component 5 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    W. Name: PC6
       Description: Principal component 6 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    X. Name: PC7
       Description: Principal component 7 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    Y. Name: PC8
       Description: Principal component 8 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.

    Z. Name: PC9
       Description: Principal component 9 from principal components analysis of personal attributes judgments for both speakers. From pre-test survey.


4. Missing data codes:

Trials with missing or incomplete data were excluded from the dataset.


5. Specialized formats of other abbreviations used

n/a