This vignette is considered deprecated! It’s content has been moved to the the EMU-SDMS manual (+ expanded and updated). Specifially see the An overview of the EMU-SDMS as well as the A tutorial on how to use the EMU-SDMS chapters.
This document is an introduction to the emuR
package and provides an overview of what the package is capable of and how it interacts with the other components of the EMU Speech Database Management System (EMU-SDMS). It is by no means a complete guide to the EMU-SDMS but rather tries to give an outline of what it is like working with and analyzing speech databases in the EMU-SDMS by walking you through a few typical use cases.
The emuR
package can be viewed as the main component of the EMU-SDMS as it acts as the central instance that is able to interact with every component of the system. It takes care of database managing duties by being able to interact with a speech database that is stored in the emuDB
format (see emuDB
vignette for further details). Further, it has easy to understand and learn yet expressive and powerful querying mechanics that allow the user to easily query the annotation structures of the database (see EQL
vignette for further details). Finally it provides easy data extraction capabilities that extracts data (e.g. formant values) which corresponds to the result of a query.
If a database in the emuDB
format is present, the typical work-flow in emuR
is usually something like this:
load_emuDB
serve
and connect the EMU-webApp to the local serverquery
(sometimes followed by requery_hier
or requery_seq
)get_trackdata
As most people that are starting to use the EMU-SDMS will probably already have some form of annotated data, we will initially show how to easily convert this existing data to the emuDB
format (for a guide on how to create an emuDB
from scratch and for information about this format see the emuDB
vignette).
For people transitioning to emuR
from the legacy EMU system, emuR
provides a function for converting existing legacyEmuDBs to the new emuDB
format. Here is an example of how to use this function:
# load the package
library(emuR)
# create demo data in folder provided by the tempdir() function
create_emuRdemoData(dir = tempdir())
# get the path to a .tpl file of a legacyEmuDB that is part of the demo data
tplPath = file.path(tempdir(), "emuR_demoData", "legacy_ae", "ae.tpl")
# convert this legacyEmuDB to the emuDB format
convert_legacyEmuDB(emuTplPath = tplPath, targetDir = tempdir())
This will create a new emuDB
in a temporary folder that is provided by the R
function tempdir()
containing all the information specified in the .tpl
file. The name of the new emuDB
is the same as the basename of the .tpl
file from which it was generated. In other words, if the template file of your legacyEmuDB has the path A
and the directory to which the converted database is to be written has the path B
, then convert_legacyEmuDB("A", targetdir = "B")
will perform the conversion.
A further function provided is the convert_TextGridCollection()
function. This function converts an existing .TextGrid
& .wav
file collection to the emuDB
format. In order to pair the correct files together the .TextGrid
files as well as the .wav
files must have the same name (i.e. file name without extension). A further restriction is that the tiers contained within all the .TextGrid
files have to be equal in name & type (equal subsets can be chosen using the tierNames
argument of the function). For example, if all .TextGrid
files contain the tiers Syl: IntervalTier
, Phonetic: IntervalTier
and Tone: TextTier
the conversion will work. However, if a single .TextGrid
of the collection has the additional tier Word: IntervalTier
the conversion will fail, although it can be made to work by specifying the equal subset equalSubset = c('Syl', 'Phonetic', 'Tone')
and passing it into the function argument convert_TextGridCollection(..., tierNames = equalSubset, ...)
.
# get the path to a folder containing .wav & .TextGrid files that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "TextGrid_collection")
# convert this TextGridCollection to the emuDB format
convert_TextGridCollection(path2folder, dbName = "myTGcolDB",
targetDir = tempdir())
This will create a new emuDB
in the folder tempdir()
called ‘myTGcolDB’. The emuDB
will contain all the tier information from the .TextGrid
files but will not contain hierarchical information as .TextGrid
files do not contain any linking information. If you are interested in how to semi-automatically generate links between the generated SEGMENT
s and EVENT
s see the Autobuilding section of the emuDB
vignette.
Similar to the convert_TextGridCollection()
function the emuR
package also provides a function for converting file collections consisting of BAS Partitur Format (BPF) and .wav
files to the emuDB
format.
# get the path to a folder containing .wav & .par files that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "BPF_collection")
# convert this BPFCollection to the emuDB format
convert_BPFCollection(path2folder, dbName = 'myBPF-DB',
targetDir = tempdir(), verbose = F)
As the BPF format also permits annotational units to be linked to one another, this conversion function can optionally preserve this hierachical information by specifying the refLevel
argument.
As was mentioned in the introduction, the initial step to working with an emuDB
is to load it into your current R session:
# get the path to emuDB called 'ae' that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# load emuDB into current R session
ae = load_emuDB(path2folder, verbose = FALSE)
Now that we have loaded the ‘ae’ emuDB
into our R session, let’s get a first impression of what the ‘ae’ emuDB
looks like by calling:
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## Database configuration:
##
## SSFF track definitions:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
##
## Level definitions:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
##
## Link definitions:
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
As you can see this displays a lot of information. Most of the information is about the various level and link definitions of the emuDB
. The summary information about the level definitions shows for instance that the ‘ae’ database has a ‘Word’ level, which is of type ‘ITEM’ and therefore does not contain any time information. It also shows that a ‘Phonetic’ level (that most likely contains phonetic symbols) of type ‘SEGMENT’ is present, which means that each phonetic annotational unit carries start time and segment duration information.
The summary information about the Link definitions shows, among others, these three ‘Link definitions’:
...
4 ONE_TO_MANY Word Syllable
5 ONE_TO_MANY Syllable Phoneme
6 MANY_TO_MANY Phoneme Phonetic
...
This implies that annotational units from the ‘Word’ level can somehow be connected to units from the ‘Phonetic’ level via two other levels called ‘Syllable’ and ‘Phoneme’. This is indeed the case and also the reason emuR
is able to deduce the time information for annotational units without time information (type: 'ITEM'
) if they are connected, even over multiple other levels, to annotational units with time information (type: 'SEGMENT'
, type: 'EVENT'
).
The easiest way to think of levels and links is a graph for each recording where levels are different linguistic representations and the Links are the relations between them. Hence for our ‘ae’ emuDB
we could say: Each recording has words, syllables and phones, and the relations are: words consist of syllables, and syllables in turn consists of abstract phonemes, which are produced as concrete phones. An schematic excerpt of such an annotation can be seen below:
Alt text
The EMU-SDMS has a fairly unique approach to annotating and visually inspecting databases, as it utilizes a web application called the EMU-webApp to act as its graphical user interface. To be able to transfer the necessary data to the web application let’s now serve the emuDB
to it by using the serve()
function:
Executing this command will block your R console and show you the following message:
Navigate your browser to the EMU-webApp URL: http://ips-lmu.github.io/EMU-webApp/
Server connection URL: ws://localhost:17890
To stop the server press EMU-webApp 'clear' button or reload the page in your browser.
By navigating to the above URL and clicking connect
in the top menu bar and connect
on the subsequent popup window, the EMU-webApp and your current R session are able to connect to each other. You can now use the EMU-webApp to visually inspect your emuDB
, annotate your data and more. Once you are finished using the EMU-webApp simply click the clear button in the top menu bar and your R console will free up again.
INFO: For more information about how to use the EMU-webApp click the EMU-webApp icon in the top right hand corner in the webApp’s top menu bar. For more information about how to configure the EMU-webApp see the ‘Configuring the EMU-webApp’ section of the emuDB
vignette.
As we have already completed the first two steps described in the typical work-flow example in the introduction, we will now describe the the rest of the workflow by walking through a few use cases. Every use case will start off by asking a question about the ‘ae’ database and will continue by walking you through the process of answering this question by using the mechanics the emuR
package provides.
The first thing that we will need to do to answer this fairly simple question, is query the database for all ‘n’ segments. This can easily be achieved using the query()
function:
## segment list from database: ae
## query was: Phonetic==n
## labels start end session bundle level type
## 1 n 1031.925 1195.925 0000 msajc003 Phonetic SEGMENT
## 2 n 1741.425 1791.425 0000 msajc003 Phonetic SEGMENT
## 3 n 1515.475 1554.475 0000 msajc010 Phonetic SEGMENT
## 4 n 2430.975 2528.475 0000 msajc010 Phonetic SEGMENT
## 5 n 894.975 1022.975 0000 msajc012 Phonetic SEGMENT
## 6 n 2402.275 2474.875 0000 msajc012 Phonetic SEGMENT
The second argument of the query()
contains a string that represents an EMU Query Language Version 2 (EQL2) statement. This fairly simple EQL2 statement consists of the level name ‘Phonetic’ on the left, the operator ‘==’ which is the equality operator of the EQL, and finally on the right hand side of the operator the label ‘n’ that we are looking for. For multiple examples and an overview of what type of queries you can produce with the EQL2 please see the EQL
vignette.
The query()
function returns an object of the class emuRsegs
that is a superclass of the well known data.frame
. The various columns of this object should be fairly self explanatory: labels
displays the extracted labels, start
/ end
are the start time and end times in milliseconds of each segment and so on. We can now use the information in this object to calculate the mean durations of these segments:
## [1] 67.05833
Once again we will start by querying our annotation structure for the segments we are interested in:
INFO: the EQL2 introduces a new operand which is the regular expression operand: =~. So alternatively we could also formulate the query like follows: “Phonetic=~‘[szSZ]’”
Now that we have extracted the necessary segment information we can simply call:
# get formant values for those segments
td = get_trackdata(ae, sl,
onTheFlyFunctionName = "forest",
resultType = "emuRtrackdata")
In this example the get_trackdata
function uses a formant estimation function called forest
to calculate the formant values on-the-fly. This signal processing function is part of the wrassp
package that is used by the emuR
package to perform signal processing duties as is the case with the above get_trackdata
command.
INFO: For more information about the wrassp package and its available signal processing functions see the wrassp_intro
vignette that is part of the wrassp
package.
If the resultType
parameter is set to "emuRtrackdata"
the get_trackdata
function returns an object with the following classes (see ?emuRtrackdata
for more details):
## [1] "emuRtrackdata" "data.frame"
As we are dealing with a data.frame
we can now simply use a package like ggplot2
to visualize our F1/F2 distribution:
# check if ggplot2 package is available (install separately with
# install.packages("ggplot2") if not available on your system)
if (requireNamespace("ggplot2", quietly = TRUE)) {
# load package
library(ggplot2)
# scatter plot of F1 and F2 values using ggplot
ggplot(td, aes(x=T2, y=T1, label=td$labels)) +
geom_text(aes(colour=factor(labels))) +
scale_y_reverse() + scale_x_reverse() +
labs(x = "F2(Hz)", y = "F1(Hz)") +
guides(colour=FALSE)
}
As we have done before, let’s query the emuDB
for the segments we are interested in:
## segment list from database: ae
## query was: Phonetic==s|z|S|Z
## labels start end session bundle level type
## 1 s 483.425 566.925 0000 msajc003 Phonetic SEGMENT
## 2 z 1195.925 1289.425 0000 msajc003 Phonetic SEGMENT
## 3 S 1289.425 1419.925 0000 msajc003 Phonetic SEGMENT
## 4 z 1548.425 1634.425 0000 msajc003 Phonetic SEGMENT
## 5 s 1791.425 1893.175 0000 msajc003 Phonetic SEGMENT
## 6 z 476.475 571.925 0000 msajc010 Phonetic SEGMENT
We can now use the requery_hier()
function to perform an hierarchical requery using the result set of our initial query. This requery follows the hierarchical links described earlier to find the linked annotational units of a different level.
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 C 187.425 674.175 0000 msajc003 Word ITEM
## 2 C 739.925 1289.425 0000 msajc003 Word ITEM
## 3 F 1289.425 1463.175 0000 msajc003 Word ITEM
## 4 F 1463.175 1634.425 0000 msajc003 Word ITEM
## 5 C 1634.425 2150.175 0000 msajc003 Word ITEM
## 6 F 411.675 571.925 0000 msajc010 Word ITEM
As we can see, the result is not quite what we would have expected as it does not contain the orthographic word transcriptions but a classification of the words into content words (‘C’) and function words (‘F’). Looking back at the output of summary()
we can see that the ‘Words’ level has multiple attributeDefintions
which indicates that each annotational unit in the ‘Words’ level has multiple parallel labels defined for it. So let’s instead try the attributeDefintion
called ‘Text’.
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 friends 739.925 1289.425 0000 msajc003 Text ITEM
## 3 she 1289.425 1463.175 0000 msajc003 Text ITEM
## 4 was 1463.175 1634.425 0000 msajc003 Text ITEM
## 5 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 6 is 411.675 571.925 0000 msajc010 Text ITEM
We can now see that for example the first segment in sibil
occured in the word ‘amongst’ which starts at ‘187.475’ ms and ends at ‘674.225’ ms.
INFO: this two step process can also be completed in a single hierarchical query using the dominance operation: ^. See the EQL
vignette for more details.
Now that we have answered the first part of the question let’s look at the left and right context of the extracted sibilants by using the requery_seq()
function.
# get left context by off-setting the annotational units in sibil one unit to the left
leftContext = requery_seq(ae, sibil, offset = -1)
head(leftContext)
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 N 426.675 483.425 0000 msajc003 Phonetic SEGMENT
## 2 n 1031.925 1195.925 0000 msajc003 Phonetic SEGMENT
## 3 z 1195.925 1289.425 0000 msajc003 Phonetic SEGMENT
## 4 @ 1506.175 1548.425 0000 msajc003 Phonetic SEGMENT
## 5 n 1741.425 1791.425 0000 msajc003 Phonetic SEGMENT
## 6 I 411.675 476.475 0000 msajc010 Phonetic SEGMENT
And the right context:
# get right context by off-setting the annotational units in sibil one unit to the right
rightContext = requery_seq(ae, sibil, offset = 1)
This will throw an error as four of the sibilants occur at the very end of the recording and therefore have no phonetic post-context. We can get the remaining post-context by setting the ignoreOutOfBounds
argument to TRUE
:
## Warning in requery_seq(ae, sibil, offset = 1, ignoreOutOfBounds = TRUE):
## Found missing items in resulting segment list! Replacing missing rows with
## NA values.
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 t 566.925 596.675 0000 msajc003 Phonetic SEGMENT
## 2 S 1289.425 1419.925 0000 msajc003 Phonetic SEGMENT
## 3 i: 1419.925 1463.175 0000 msajc003 Phonetic SEGMENT
## 4 k 1634.425 1675.925 0000 msajc003 Phonetic SEGMENT
## 5 I 1893.175 1945.425 0000 msajc003 Phonetic SEGMENT
## 6 f 571.925 674.475 0000 msajc010 Phonetic SEGMENT
NOTE: The resulting rightContext
and the original sibil
objects are not “in sync” any more! It is therefore dangerous to use this option per default, as one often relies on the rows in multiple emuRsegs
objects that where created from each other by using either requery_hier()
or requery_seq()
to be “in sync” with each other (i.e. that the same row index implicitly indicates a relationship).
Once again let’s query the emuDB
for the segments we are interested in (this time using the new RegEx operand:=~):
let’s now use get_trackdata()
, this time to extract the discrete Fourier transform values for our segments:
As we have not set the resultType
parameter to "emuRtrackdata"
an object of the class trackdata
is returned. This object, just like an object of the class emuRtrackdata
, contains the extracted trackdata information. Compared to the emuRtrackdata
class the object is however not “flat” and in the form of a data.frame
but has a more nested structure (see ?trackdata
for more details).
# execute this to show 16 spectra calculated from the first segment in sibil (an 's')
# (console output will not be shown here as it is very lengthy)
dftTd[1]
Since we want to analyse sibilant spectral data we will now reduce the spectral range of the data to 1000 - 10000 Hz. This is due to the fact that there is a lot of unwanted noise in the lower bands that is irrelevant for the problem at hand and can even skew the end results. To achieve this we can use a property of a trackdata
object that also carries the class spectral
, which is that it is indexed using frequencies. We will now use this trait to extract the relevant spectral frequencies of the trackdata
object:
Now we are ready to calculate the spectral moments from the reduced spectra:
The resulting dftTdRelFreqMom
object is once again a trackdata object of the same length. Contained in it are the first four spectral moments:
## trackdata from track:
## index:
## left right
## 1 16
## ftime:
## start end
## [1,] 487.5 562.5
## data:
## [,1] [,2] [,3] [,4]
## 487.5 5335.375 6469573 0.060974902 -1.1033084
## 492.5 5559.115 6208197 -0.002145643 -1.0653681
## 497.5 5666.755 6245230 -0.046790835 -1.1018919
## 502.5 5739.233 6119412 -0.065039694 -1.0752794
## 507.5 5652.698 6168305 -0.037766218 -1.0638400
## 512.5 5691.959 5680021 -0.047454770 -0.9768125
## 517.5 5731.836 5798009 -0.056078859 -0.9949244
## 522.5 5768.820 5477618 -0.087251329 -0.9168237
## 527.5 5642.518 5737447 -0.038230053 -0.9859143
## 532.5 5663.638 5605144 -0.026298118 -0.9751338
## 537.5 5673.086 5936908 -0.082604419 -1.0329421
## 542.5 5786.514 5568992 -0.124024954 -0.9408169
## 547.5 5836.446 5654546 -0.126428302 -0.9779500
## 552.5 5981.293 5157447 -0.160357471 -0.9052777
## 557.5 5872.480 5392805 -0.127150234 -0.9490674
## 562.5 5926.184 5017286 -0.122898324 -0.9120789
We can now use the information stored in the dftTdRelFreqMom
and sibil
objects to plot by-phonetic-category ensemble and time normalized version of the first spectral moments using emuR
’s dplot()
function:
dplot(dftTdRelFreqMom[, 1],
sibil$labels,
normalise = TRUE,
xlab = "Normalized Time [%]",
ylab = "1st spectral moment [Hz]")
As one might expect, the first spectral moment (= the center of gravity) is significantly lower for ‘S’ and ‘Z’ (green & blue lines) than for ‘s’ and ‘z’ (black & red lines).
Alternatively we can average the ensembles into single trajectories by setting the average
parameter of dplot()
to TRUE
:
dplot(dftTdRelFreqMom[,1],
sibil$labels,
normalise = TRUE,
average = TRUE,
xlab = "Normalized Time [%]",
ylab = "1st spectral moment [Hz]")
As can be seen from the previous two plots, transitions to and from a sort of “steady state” around the temporal midpoint of the sibilants are clearly visible. To focus on this “steady state” part of the sibilant we will not cut out the middle 60% portion of the previously calculated moments using the dcut()
function:
# cut out the middle 60% portion
dftTdRelFreqMomMid = dcut(dftTdRelFreqMom,
left.time = 0.2,
right.time = 0.8,
prop = T)
# display original moments of the first segment
dftTdRelFreqMom[1]
## trackdata from track:
## index:
## left right
## 1 16
## ftime:
## start end
## [1,] 487.5 562.5
## data:
## [,1] [,2] [,3] [,4]
## 487.5 5335.375 6469573 0.060974902 -1.1033084
## 492.5 5559.115 6208197 -0.002145643 -1.0653681
## 497.5 5666.755 6245230 -0.046790835 -1.1018919
## 502.5 5739.233 6119412 -0.065039694 -1.0752794
## 507.5 5652.698 6168305 -0.037766218 -1.0638400
## 512.5 5691.959 5680021 -0.047454770 -0.9768125
## 517.5 5731.836 5798009 -0.056078859 -0.9949244
## 522.5 5768.820 5477618 -0.087251329 -0.9168237
## 527.5 5642.518 5737447 -0.038230053 -0.9859143
## 532.5 5663.638 5605144 -0.026298118 -0.9751338
## 537.5 5673.086 5936908 -0.082604419 -1.0329421
## 542.5 5786.514 5568992 -0.124024954 -0.9408169
## 547.5 5836.446 5654546 -0.126428302 -0.9779500
## 552.5 5981.293 5157447 -0.160357471 -0.9052777
## 557.5 5872.480 5392805 -0.127150234 -0.9490674
## 562.5 5926.184 5017286 -0.122898324 -0.9120789
## trackdata from unknown track.
## index:
## left right
## 1 10
## ftime:
## start end
## [1,] 502.5 547.5
## data:
## [,1] [,2] [,3] [,4]
## 502.5 5739.233 6119412 -0.06503969 -1.0752794
## 507.5 5652.698 6168305 -0.03776622 -1.0638400
## 512.5 5691.959 5680021 -0.04745477 -0.9768125
## 517.5 5731.836 5798009 -0.05607886 -0.9949244
## 522.5 5768.820 5477618 -0.08725133 -0.9168237
## 527.5 5642.518 5737447 -0.03823005 -0.9859143
## 532.5 5663.638 5605144 -0.02629812 -0.9751338
## 537.5 5673.086 5936908 -0.08260442 -1.0329421
## 542.5 5786.514 5568992 -0.12402495 -0.9408169
## 547.5 5836.446 5654546 -0.12642830 -0.9779500
To wrap up, let’s calculate the averages of these middle trajectories using the trapply
function:
meanFirstMoments = trapply(dftTdRelFreqMomMid[,1],
fun = mean,
simplify = T)
# display resulting vector
meanFirstMoments
## [1] 5718.675 5891.755 5345.288 5967.853 6011.453 5971.805 5947.452
## [8] 5986.816 5935.862 5347.980 5801.145 5285.305 6002.954 5867.859
## [15] 6014.950 6041.754 5671.781 5942.811 5920.483 5551.712 5895.388
## [22] 5900.361 5201.412 5380.083 6055.406 5984.184 5743.045 6080.457
## [29] 6000.298 6008.666 5843.371 5901.677
As the resulting meanFirstMoments
vector has the same length as the initial sibil
segment list, we can now easily visualize these values in the form of a boxplot:
INFO: Using the "emuRtrackdata"
resultType
of get_trackdata
function we could have performed a comparable analysis by utilizing packages such as dplyr
for data.frame
manipulation and lattice
or ggplot2
for data visualisation.
In this vignette we tried to give you a quick practical overview of what it is like working with the emuR package that is part of the EMU-SDMS. If you are new to the system we definitely also recommend that you read the emuDB
and EQL
vignettes that are part of the emuR
package. These will give more insight into the structure of / how you can interact with with emuDB
s and what the EMU Query Language offers. As the new EMU system has kept most of the concepts of the legacy EMU system in place it is definitely also worth looking at Jonathan Harrington’s Book Phonetic Analysis of Speech Corpora (Harrington 2010).
Harrington, Jonathan. 2010. Phonetic Analysis of Speech Corpora. John Wiley & Sons.