An Introduction to the emuR package

The Main Package of the EMU Speech Database Management System

2019-01-29

DEPRECATION WARNING

This vignette is considered deprecated! It’s content has been moved to the the EMU-SDMS manual (+ expanded and updated). Specifially see the An overview of the EMU-SDMS as well as the A tutorial on how to use the EMU-SDMS chapters.

Introduction

This document is an introduction to the emuR package and provides an overview of what the package is capable of and how it interacts with the other components of the EMU Speech Database Management System (EMU-SDMS). It is by no means a complete guide to the EMU-SDMS but rather tries to give an outline of what it is like working with and analyzing speech databases in the EMU-SDMS by walking you through a few typical use cases.

The emuR package can be viewed as the main component of the EMU-SDMS as it acts as the central instance that is able to interact with every component of the system. It takes care of database managing duties by being able to interact with a speech database that is stored in the emuDB format (see emuDB vignette for further details). Further, it has easy to understand and learn yet expressive and powerful querying mechanics that allow the user to easily query the annotation structures of the database (see EQL vignette for further details). Finally it provides easy data extraction capabilities that extracts data (e.g. formant values) which corresponds to the result of a query.

If a database in the emuDB format is present, the typical work-flow in emuR is usually something like this:

  1. Load database into current R session - load_emuDB
  2. Database annotation / visual inspection - serve and connect the EMU-webApp to the local server
  3. Query database - query (sometimes followed by requery_hier or requery_seq)
  4. Get trackdata (e.g. formant values) for the result of a query - get_trackdata
  5. Data preparation
  6. Visual data inspection
  7. Further analysis and statistical processing

Converting existing databases

As most people that are starting to use the EMU-SDMS will probably already have some form of annotated data, we will initially show how to easily convert this existing data to the emuDB format (for a guide on how to create an emuDB from scratch and for information about this format see the emuDB vignette).

legacy EMU databases

For people transitioning to emuR from the legacy EMU system, emuR provides a function for converting existing legacyEmuDBs to the new emuDB format. Here is an example of how to use this function:

This will create a new emuDB in a temporary folder that is provided by the R function tempdir() containing all the information specified in the .tpl file. The name of the new emuDB is the same as the basename of the .tpl file from which it was generated. In other words, if the template file of your legacyEmuDB has the path A and the directory to which the converted database is to be written has the path B, then convert_legacyEmuDB("A", targetdir = "B") will perform the conversion.

TextGrid collections

A further function provided is the convert_TextGridCollection() function. This function converts an existing .TextGrid & .wav file collection to the emuDB format. In order to pair the correct files together the .TextGrid files as well as the .wav files must have the same name (i.e. file name without extension). A further restriction is that the tiers contained within all the .TextGrid files have to be equal in name & type (equal subsets can be chosen using the tierNames argument of the function). For example, if all .TextGrid files contain the tiers Syl: IntervalTier, Phonetic: IntervalTier and Tone: TextTier the conversion will work. However, if a single .TextGrid of the collection has the additional tier Word: IntervalTier the conversion will fail, although it can be made to work by specifying the equal subset equalSubset = c('Syl', 'Phonetic', 'Tone') and passing it into the function argument convert_TextGridCollection(..., tierNames = equalSubset, ...).

This will create a new emuDB in the folder tempdir() called ‘myTGcolDB’. The emuDB will contain all the tier information from the .TextGrid files but will not contain hierarchical information as .TextGrid files do not contain any linking information. If you are interested in how to semi-automatically generate links between the generated SEGMENTs and EVENTs see the Autobuilding section of the emuDB vignette.

BPF collections

Similar to the convert_TextGridCollection() function the emuR package also provides a function for converting file collections consisting of BAS Partitur Format (BPF) and .wav files to the emuDB format.

As the BPF format also permits annotational units to be linked to one another, this conversion function can optionally preserve this hierachical information by specifying the refLevel argument.

Loading and inspecting the database

As was mentioned in the introduction, the initial step to working with an emuDB is to load it into your current R session:

# get the path to emuDB called 'ae' that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "ae_emuDB")

# load emuDB into current R session
ae = load_emuDB(path2folder, verbose = FALSE)

Overview

Now that we have loaded the ‘ae’ emuDB into our R session, let’s get a first impression of what the ‘ae’ emuDB looks like by calling:

## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785 
## 
## Database configuration:
## 
## SSFF track definitions:
##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms
## 
## Level definitions:
##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;
## 
## Link definitions:
##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

As you can see this displays a lot of information. Most of the information is about the various level and link definitions of the emuDB. The summary information about the level definitions shows for instance that the ‘ae’ database has a ‘Word’ level, which is of type ‘ITEM’ and therefore does not contain any time information. It also shows that a ‘Phonetic’ level (that most likely contains phonetic symbols) of type ‘SEGMENT’ is present, which means that each phonetic annotational unit carries start time and segment duration information.

The summary information about the Link definitions shows, among others, these three ‘Link definitions’:

...
4  ONE_TO_MANY           Word     Syllable
5  ONE_TO_MANY       Syllable      Phoneme
6 MANY_TO_MANY        Phoneme     Phonetic
...

This implies that annotational units from the ‘Word’ level can somehow be connected to units from the ‘Phonetic’ level via two other levels called ‘Syllable’ and ‘Phoneme’. This is indeed the case and also the reason emuR is able to deduce the time information for annotational units without time information (type: 'ITEM') if they are connected, even over multiple other levels, to annotational units with time information (type: 'SEGMENT', type: 'EVENT').

The easiest way to think of levels and links is a graph for each recording where levels are different linguistic representations and the Links are the relations between them. Hence for our ‘ae’ emuDB we could say: Each recording has words, syllables and phones, and the relations are: words consist of syllables, and syllables in turn consists of abstract phonemes, which are produced as concrete phones. An schematic excerpt of such an annotation can be seen below:

Alt text

Alt text

Database annotation / visual inspection

The EMU-SDMS has a fairly unique approach to annotating and visually inspecting databases, as it utilizes a web application called the EMU-webApp to act as its graphical user interface. To be able to transfer the necessary data to the web application let’s now serve the emuDB to it by using the serve() function:

Executing this command will block your R console and show you the following message:

Navigate your browser to the EMU-webApp URL: http://ips-lmu.github.io/EMU-webApp/
Server connection URL: ws://localhost:17890
To stop the server press EMU-webApp 'clear' button or reload the page in your browser.

By navigating to the above URL and clicking connect in the top menu bar and connect on the subsequent popup window, the EMU-webApp and your current R session are able to connect to each other. You can now use the EMU-webApp to visually inspect your emuDB, annotate your data and more. Once you are finished using the EMU-webApp simply click the clear button in the top menu bar and your R console will free up again.

INFO: For more information about how to use the EMU-webApp click the EMU-webApp icon in the top right hand corner in the webApp’s top menu bar. For more information about how to configure the EMU-webApp see the ‘Configuring the EMU-webApp’ section of the emuDB vignette.

Use cases

As we have already completed the first two steps described in the typical work-flow example in the introduction, we will now describe the the rest of the workflow by walking through a few use cases. Every use case will start off by asking a question about the ‘ae’ database and will continue by walking you through the process of answering this question by using the mechanics the emuR package provides.

1.) What is the average length of all ‘n’ phonetic segments in the ‘ae’ emuDB?

The first thing that we will need to do to answer this fairly simple question, is query the database for all ‘n’ segments. This can easily be achieved using the query() function:

## segment  list from database:  ae 
## query was:  Phonetic==n 
##   labels    start      end session   bundle    level    type
## 1      n 1031.925 1195.925    0000 msajc003 Phonetic SEGMENT
## 2      n 1741.425 1791.425    0000 msajc003 Phonetic SEGMENT
## 3      n 1515.475 1554.475    0000 msajc010 Phonetic SEGMENT
## 4      n 2430.975 2528.475    0000 msajc010 Phonetic SEGMENT
## 5      n  894.975 1022.975    0000 msajc012 Phonetic SEGMENT
## 6      n 2402.275 2474.875    0000 msajc012 Phonetic SEGMENT

The second argument of the query() contains a string that represents an EMU Query Language Version 2 (EQL2) statement. This fairly simple EQL2 statement consists of the level name ‘Phonetic’ on the left, the operator ‘==’ which is the equality operator of the EQL, and finally on the right hand side of the operator the label ‘n’ that we are looking for. For multiple examples and an overview of what type of queries you can produce with the EQL2 please see the EQL vignette.

The query() function returns an object of the class emuRsegs that is a superclass of the well known data.frame. The various columns of this object should be fairly self explanatory: labels displays the extracted labels, start / end are the start time and end times in milliseconds of each segment and so on. We can now use the information in this object to calculate the mean durations of these segments:

## [1] 67.05833

2.) What does the F1/F2 distribution of all phonetic segments that contain the labels I, o:, u:, V or @ look like?

Once again we will start by querying our annotation structure for the segments we are interested in:

INFO: the EQL2 introduces a new operand which is the regular expression operand: =~. So alternatively we could also formulate the query like follows: “Phonetic=~‘[szSZ]’”

Now that we have extracted the necessary segment information we can simply call:

In this example the get_trackdata function uses a formant estimation function called forest to calculate the formant values on-the-fly. This signal processing function is part of the wrassp package that is used by the emuR package to perform signal processing duties as is the case with the above get_trackdata command.

INFO: For more information about the wrassp package and its available signal processing functions see the wrassp_intro vignette that is part of the wrassp package.

If the resultType parameter is set to "emuRtrackdata" the get_trackdata function returns an object with the following classes (see ?emuRtrackdata for more details):

## [1] "emuRtrackdata" "data.frame"

As we are dealing with a data.frame we can now simply use a package like ggplot2 to visualize our F1/F2 distribution:

3.) What words do the phonetic segments that carry the labels s, z, S or Z in the ‘ae’ emuDB occur in and what is their preceding phonetic context?

As we have done before, let’s query the emuDB for the segments we are interested in:

## segment  list from database:  ae 
## query was:  Phonetic==s|z|S|Z 
##   labels    start      end session   bundle    level    type
## 1      s  483.425  566.925    0000 msajc003 Phonetic SEGMENT
## 2      z 1195.925 1289.425    0000 msajc003 Phonetic SEGMENT
## 3      S 1289.425 1419.925    0000 msajc003 Phonetic SEGMENT
## 4      z 1548.425 1634.425    0000 msajc003 Phonetic SEGMENT
## 5      s 1791.425 1893.175    0000 msajc003 Phonetic SEGMENT
## 6      z  476.475  571.925    0000 msajc010 Phonetic SEGMENT

We can now use the requery_hier() function to perform an hierarchical requery using the result set of our initial query. This requery follows the hierarchical links described earlier to find the linked annotational units of a different level.

## segment  list from database:  ae 
## query was:  FROM REQUERY 
##   labels    start      end session   bundle level type
## 1      C  187.425  674.175    0000 msajc003  Word ITEM
## 2      C  739.925 1289.425    0000 msajc003  Word ITEM
## 3      F 1289.425 1463.175    0000 msajc003  Word ITEM
## 4      F 1463.175 1634.425    0000 msajc003  Word ITEM
## 5      C 1634.425 2150.175    0000 msajc003  Word ITEM
## 6      F  411.675  571.925    0000 msajc010  Word ITEM

As we can see, the result is not quite what we would have expected as it does not contain the orthographic word transcriptions but a classification of the words into content words (‘C’) and function words (‘F’). Looking back at the output of summary() we can see that the ‘Words’ level has multiple attributeDefintions which indicates that each annotational unit in the ‘Words’ level has multiple parallel labels defined for it. So let’s instead try the attributeDefintion called ‘Text’.

## segment  list from database:  ae 
## query was:  FROM REQUERY 
##       labels    start      end session   bundle level type
## 1    amongst  187.425  674.175    0000 msajc003  Text ITEM
## 2    friends  739.925 1289.425    0000 msajc003  Text ITEM
## 3        she 1289.425 1463.175    0000 msajc003  Text ITEM
## 4        was 1463.175 1634.425    0000 msajc003  Text ITEM
## 5 considered 1634.425 2150.175    0000 msajc003  Text ITEM
## 6         is  411.675  571.925    0000 msajc010  Text ITEM

We can now see that for example the first segment in sibil occured in the word ‘amongst’ which starts at ‘187.475’ ms and ends at ‘674.225’ ms.

INFO: this two step process can also be completed in a single hierarchical query using the dominance operation: ^. See the EQL vignette for more details.

Now that we have answered the first part of the question let’s look at the left and right context of the extracted sibilants by using the requery_seq() function.

## segment  list from database:  ae 
## query was:  FROM REQUERY 
##   labels    start      end session   bundle    level    type
## 1      N  426.675  483.425    0000 msajc003 Phonetic SEGMENT
## 2      n 1031.925 1195.925    0000 msajc003 Phonetic SEGMENT
## 3      z 1195.925 1289.425    0000 msajc003 Phonetic SEGMENT
## 4      @ 1506.175 1548.425    0000 msajc003 Phonetic SEGMENT
## 5      n 1741.425 1791.425    0000 msajc003 Phonetic SEGMENT
## 6      I  411.675  476.475    0000 msajc010 Phonetic SEGMENT

And the right context:

This will throw an error as four of the sibilants occur at the very end of the recording and therefore have no phonetic post-context. We can get the remaining post-context by setting the ignoreOutOfBounds argument to TRUE:

## Warning in requery_seq(ae, sibil, offset = 1, ignoreOutOfBounds = TRUE):
## Found missing items in resulting segment list! Replacing missing rows with
## NA values.
## segment  list from database:  ae 
## query was:  FROM REQUERY 
##   labels    start      end session   bundle    level    type
## 1      t  566.925  596.675    0000 msajc003 Phonetic SEGMENT
## 2      S 1289.425 1419.925    0000 msajc003 Phonetic SEGMENT
## 3     i: 1419.925 1463.175    0000 msajc003 Phonetic SEGMENT
## 4      k 1634.425 1675.925    0000 msajc003 Phonetic SEGMENT
## 5      I 1893.175 1945.425    0000 msajc003 Phonetic SEGMENT
## 6      f  571.925  674.475    0000 msajc010 Phonetic SEGMENT

NOTE: The resulting rightContext and the original sibil objects are not “in sync” any more! It is therefore dangerous to use this option per default, as one often relies on the rows in multiple emuRsegs objects that where created from each other by using either requery_hier() or requery_seq() to be “in sync” with each other (i.e. that the same row index implicitly indicates a relationship).

4.) Do the sibilant Phonetic segments that carry the labels s; z; S or Z in the ‘ae’ emuDB differ with respect to their first spectral moment?

Once again let’s query the emuDB for the segments we are interested in (this time using the new RegEx operand:=~):

let’s now use get_trackdata(), this time to extract the discrete Fourier transform values for our segments:

As we have not set the resultType parameter to "emuRtrackdata" an object of the class trackdata is returned. This object, just like an object of the class emuRtrackdata, contains the extracted trackdata information. Compared to the emuRtrackdata class the object is however not “flat” and in the form of a data.frame but has a more nested structure (see ?trackdata for more details).

Since we want to analyse sibilant spectral data we will now reduce the spectral range of the data to 1000 - 10000 Hz. This is due to the fact that there is a lot of unwanted noise in the lower bands that is irrelevant for the problem at hand and can even skew the end results. To achieve this we can use a property of a trackdata object that also carries the class spectral, which is that it is indexed using frequencies. We will now use this trait to extract the relevant spectral frequencies of the trackdata object:

Now we are ready to calculate the spectral moments from the reduced spectra:

The resulting dftTdRelFreqMom object is once again a trackdata object of the same length. Contained in it are the first four spectral moments:

## trackdata from track:  
## index:
##  left right
##     1    16
## ftime:
##      start   end
## [1,] 487.5 562.5
## data:
##           [,1]    [,2]         [,3]       [,4]
## 487.5 5335.375 6469573  0.060974902 -1.1033084
## 492.5 5559.115 6208197 -0.002145643 -1.0653681
## 497.5 5666.755 6245230 -0.046790835 -1.1018919
## 502.5 5739.233 6119412 -0.065039694 -1.0752794
## 507.5 5652.698 6168305 -0.037766218 -1.0638400
## 512.5 5691.959 5680021 -0.047454770 -0.9768125
## 517.5 5731.836 5798009 -0.056078859 -0.9949244
## 522.5 5768.820 5477618 -0.087251329 -0.9168237
## 527.5 5642.518 5737447 -0.038230053 -0.9859143
## 532.5 5663.638 5605144 -0.026298118 -0.9751338
## 537.5 5673.086 5936908 -0.082604419 -1.0329421
## 542.5 5786.514 5568992 -0.124024954 -0.9408169
## 547.5 5836.446 5654546 -0.126428302 -0.9779500
## 552.5 5981.293 5157447 -0.160357471 -0.9052777
## 557.5 5872.480 5392805 -0.127150234 -0.9490674
## 562.5 5926.184 5017286 -0.122898324 -0.9120789

We can now use the information stored in the dftTdRelFreqMom and sibil objects to plot by-phonetic-category ensemble and time normalized version of the first spectral moments using emuR’s dplot() function:

As one might expect, the first spectral moment (= the center of gravity) is significantly lower for ‘S’ and ‘Z’ (green & blue lines) than for ‘s’ and ‘z’ (black & red lines).

Alternatively we can average the ensembles into single trajectories by setting the average parameter of dplot() to TRUE:

As can be seen from the previous two plots, transitions to and from a sort of “steady state” around the temporal midpoint of the sibilants are clearly visible. To focus on this “steady state” part of the sibilant we will not cut out the middle 60% portion of the previously calculated moments using the dcut() function:

## trackdata from track:  
## index:
##  left right
##     1    16
## ftime:
##      start   end
## [1,] 487.5 562.5
## data:
##           [,1]    [,2]         [,3]       [,4]
## 487.5 5335.375 6469573  0.060974902 -1.1033084
## 492.5 5559.115 6208197 -0.002145643 -1.0653681
## 497.5 5666.755 6245230 -0.046790835 -1.1018919
## 502.5 5739.233 6119412 -0.065039694 -1.0752794
## 507.5 5652.698 6168305 -0.037766218 -1.0638400
## 512.5 5691.959 5680021 -0.047454770 -0.9768125
## 517.5 5731.836 5798009 -0.056078859 -0.9949244
## 522.5 5768.820 5477618 -0.087251329 -0.9168237
## 527.5 5642.518 5737447 -0.038230053 -0.9859143
## 532.5 5663.638 5605144 -0.026298118 -0.9751338
## 537.5 5673.086 5936908 -0.082604419 -1.0329421
## 542.5 5786.514 5568992 -0.124024954 -0.9408169
## 547.5 5836.446 5654546 -0.126428302 -0.9779500
## 552.5 5981.293 5157447 -0.160357471 -0.9052777
## 557.5 5872.480 5392805 -0.127150234 -0.9490674
## 562.5 5926.184 5017286 -0.122898324 -0.9120789
## trackdata from unknown track.
## index:
##  left right
##     1    10
## ftime:
##      start   end
## [1,] 502.5 547.5
## data:
##           [,1]    [,2]        [,3]       [,4]
## 502.5 5739.233 6119412 -0.06503969 -1.0752794
## 507.5 5652.698 6168305 -0.03776622 -1.0638400
## 512.5 5691.959 5680021 -0.04745477 -0.9768125
## 517.5 5731.836 5798009 -0.05607886 -0.9949244
## 522.5 5768.820 5477618 -0.08725133 -0.9168237
## 527.5 5642.518 5737447 -0.03823005 -0.9859143
## 532.5 5663.638 5605144 -0.02629812 -0.9751338
## 537.5 5673.086 5936908 -0.08260442 -1.0329421
## 542.5 5786.514 5568992 -0.12402495 -0.9408169
## 547.5 5836.446 5654546 -0.12642830 -0.9779500

To wrap up, let’s calculate the averages of these middle trajectories using the trapply function:

##  [1] 5718.675 5891.755 5345.288 5967.853 6011.453 5971.805 5947.452
##  [8] 5986.816 5935.862 5347.980 5801.145 5285.305 6002.954 5867.859
## [15] 6014.950 6041.754 5671.781 5942.811 5920.483 5551.712 5895.388
## [22] 5900.361 5201.412 5380.083 6055.406 5984.184 5743.045 6080.457
## [29] 6000.298 6008.666 5843.371 5901.677

As the resulting meanFirstMoments vector has the same length as the initial sibil segment list, we can now easily visualize these values in the form of a boxplot:

INFO: Using the "emuRtrackdata" resultType of get_trackdata function we could have performed a comparable analysis by utilizing packages such as dplyr for data.frame manipulation and lattice or ggplot2 for data visualisation.

Further reading

In this vignette we tried to give you a quick practical overview of what it is like working with the emuR package that is part of the EMU-SDMS. If you are new to the system we definitely also recommend that you read the emuDB and EQL vignettes that are part of the emuR package. These will give more insight into the structure of / how you can interact with with emuDBs and what the EMU Query Language offers. As the new EMU system has kept most of the concepts of the legacy EMU system in place it is definitely also worth looking at Jonathan Harrington’s Book Phonetic Analysis of Speech Corpora (Harrington 2010).

References

Harrington, Jonathan. 2010. Phonetic Analysis of Speech Corpora. John Wiley & Sons.