Transcription Conventions for L2 Spanish

Here we describe some of the general decisions we have taken in the transcribing of spoken L2 Spanish, using the CHAT system developed by the CHILDES project. We also describe below some of the adaptations we have made to the CHAT system, in the context of L2 data.

A detailed guide to the transcription of L2 Spanish using CHAT conventions has been produced and the most recent version is available at SPLLOC Transcription Guidelines.

General Decisions

Anonymisation

All transcripts and sound files have been anonymised to eliminate personal details so that individual learners are not identifiable.

Orthographic Transcription

The data is being transcribed orthographically. This is necessary in order to use the Spanish morphosyntactic parser on the completed transcripts. In the interests of automatic part of speech (POS) tagging at times the transcription is somewhat deviant from the actual phonological shape of the words produced by learners. However other researchers interested in e.g. L2 Spanish phonology, can refer to the soundfiles and add their own level of coding to the transcripts provided.

Limited use of the Error Tier (%err)

We have not consistently used an error tier, which was not necessary for our research agenda, as the syntactic and morphological errors made by our L2 learners can be retrieved more systematically from the POS tagged output.

However, there were some instances where the word produced was very deviant from the target but nonetheless, we could easily recognise it from the context. In this case we use the error tier, as exemplified below:

@Begin
@Languages: es
@Participants: S02 Subject, MJA Investigator
@ID: es|splloc|S02||female|Year9||Subject||
@ID: es|splloc|MJA||female|||Investigator||
@Date: 27-MAR-2007
@Location: K
@Situation: Picture Sequence
@Coder: CSP
@Time Duration: <0:06:56>
*MJA: [^ eng: student number two picture sequence task] .
*MJA: qué hace el estudiante con el bolígrafo ?
*S02: estudiante usar [*] el bolígrafo estudiante guardar el bolígrafo .
%err: iusar = usar
*MJA: qué hace el chico con las sillas ?
*S02: el chico tirar las sillas el chico recoger [*] las sillas .
%err: recoguer = recoger

Pauses

All pauses are indicated with # and have not been timed.

Overlapping

Overlapping of speech turns in the written transcripts is indicated using standard CHAT conventions.

Mean Length of Utterance

The speech turns for the L2 learner(s) in every file have been separated into distinct utterances as per CHILDES conventions, so MLU calculations can be carried out. However this has not been done for the researcher speech turns, so MLU calculations on the researchers' length of utterance will not be accurate.

L2 adaptations

A number of codes are being added to the CHAT system for the specific purposes of second language research. These codes cover the following issues:

Use of L1 English (complete utterances and/ or codeswitching at word or phrase level)
--Codeswitching at word or phrase level. We mark that by adding "@s:" followed by a different code corresponding to different categories (e.g. noun (d), verb (v) etc.; see §3.6 in SPLLOC Transcription Guidelines document for detail):
*P63: y cómo se dice scuba@s:d diving@s:v ?
--Complete utterances. Marked between square brackets starting with the code "^eng:".
*P04: [^ eng: I don't know what that means ].
Direct learner imitations of investigator utterances in Spanish, Marked with "@g" at the end of the imitated word.
*P51: no están en el sol están en shade@s:d.
*MJA: la sombra.
*P51: la@g sombra@g.
Use by learners of indeterminate forms and idiosyncratic neologisms. Marked with :"@n" at the end of the word.
*P54: um ehm detrás de lo eh pictura@n eh hay [/] hay un número de turistas . *