This vignette is considered deprecated! It’s content has been moved to the the EMU-SDMS manual (+ expanded and updated). Specifially see the The emuDB Format chapter.


This document describes the emuDB format that is used by the emuR package and shows how to create and interact with this format. The emuDB format is meant as a simple, general purpose way of storing speech databases that may contain complex, rich, hierarchical annotations as well as derived and complementary speech data. These different components will be described throughout this document and examples given as to how to generate and manipulate them. This document is meant as a practical guide / reference document to the emuDB format. The examples given below can be executed in any R session with the emuR package installed and may of course be adapted to your personal needs. First let us have a look at the general structure of an emuDB. Whenever we use a name like _XXX in the following we imply a varying prefix name (or base name) before the _ while the XXX is an obligatory string, e.g. _bndl implies file names such as rec1_bndl, rec2_bndl of the type bundle folder. The extension .json denotes a text file in JSON format.

Database design

The database structure is basically a set of files and folders that adhere to a certain structure and naming convention (see Figure below).

emuDB file & folder structure

emuDB file & folder structure

The database root directory must contain a single _DBconfig.json file which, as the name implies, contains the configuration options of the database such as its level definitions, how these levels are linked in the database hierarchy and what is displayed in the EMU-webApp. The database root folder also contains arbitrarily named session folders ending with _ses, e.g. 0000_ses. These session folders can be used to logically group the recordings of a database. All files belonging to a single recording are contained in a so called bundle folder described below. A possible grouping into sessions could for instance be that all recordings of a speaker AAA are contained in one session called AAA_ses.

Each session folder can contain any number of _bndl folders, e.g. rec1_bndl rec2_bndl ... rec9_bndl. All the files belonging to a recording, i.e. all files describing the same time line of events, are stored in the corresponding bundle folder. This must include the actual recording (.wav) and can contain optional derived / complimentary signal files in the SSFF format (???) such as formants (.fms) or the fundamental frequency (.f0), both of which can be generated using the wrassp package. Each bundle folder must also contain the annotation file (_annot.json) of that bundle. This file contains the actual annotations including the hierarchical linking information. JSON schema files are provided to ensure the syntactic integrity of the database (see the dist/schemaFiles/ directory of the EMU-webApp GitHub repository). The following restrictions apply:

Files that do not follow this naming convention will simply be ignored by the database interaction functions of the emuR package (for instance additional multiple audio channels stored in individual audio files).

Optional files that may also be included in the database root directory are the _bundleList.json files. These files specify which annotator is assigned to which bundles. These files are used by EMU-websocket-protocol servers that implement user management to assign the correct bundles to the annotators. The serve() function implemented in the emuR package DOES NOT support user management which means that these files will simply be ignored by this function.

For more detailed information about the file formats used see the File descriptions section of this document. Let us now have a look at creating a new emuDB.

Creating a emuDB

There are multiple ways of creating emuDBs. The two main strategies are to either convert existing databases or file collections to the new format or to create new databases from scratch. Refer to the emuR\_intro vignette (command: vignette("emuR_intro", package="emuR")) on how existing databases can be converted; in the following the latter of both strategies is described.

Creating an emuDB from scratch

To create an emuDB from scratch simply call:

This will create an empty emuDB that does not have any ssffTrackDefinitions or levelsDefinitions as well as not containing any sessions or bundles. Adding these to the emuDB is described in the next section.

Editing a database

The initial step in manipulating or generally interacting with a database is to load the according database into your current R session.

# generate path to the empty fromScratchDB created above
dbPath = file.path(tempdir(), 'fromScratchDB_emuDB')
# load database
dbHandle = load_emuDB(dbPath, verbose = F)
## [1] "<emuDBhandle> (dbName = 'fromScratchDB', basePath = '/private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/fromScratchDB_emuDB')"

This will load the database into it’s cached form for quick access to the data. Note that if a large emuDB has never been loaded and no cache has previously been generated, this can take a while to complete. Once a cache is present only altered annotation files have to be updated which reduces load times dramatically. As you can see the load_emuDB() function returns a database handle. This emuDBhandle is used to reference the loaded database in most database interaction functions of the emuR package.

Next, let us look at some actual database manipulation functions. The general function prefix naming convention of database manipulation functions for loaded databases are:

Level definitions

Unlike other systems the EMU Speech Database Management System requires the user to formally define the structure of the database. An essential structural element of any emuDB are its levels. A level is a more general term for what is often referred to as a “tier”. It is more general in the sense that people usually expect tiers to contain time information. Levels can either contain time information if they are of the type “EVENT” or of the type “SEGMENT” but are timeless if they are of the type “ITEM”. Generally speaking, every unit of annotation is referred to as an “ITEM” in the context of an emuDB and “EVENT”s and “SEGMENT”s are special instances of these containing time information in the form of sample values.

The EMU system generally distinguishes between the actual representations of a structural element which are contained within the database and their formal definitions. An example of an actual representation would be a level contained in an annotation file that contains “SEGMENT”s that annotate a recording. The corresponding formal definition would be this level’s level definition, which specifies and validates the level’s existence within the database.

NOTE: if instances are mentioned in the course of this document, the actual representations are meant. Formal definitions are referred to as such.

As the already loaded ‘fromScratchDB’ does not contain any formal definitions of structural elements including levels we will begin by adding such a formal definition in the form of a new level definition:

To check if this action was successful we can simply list the current level definitions by calling:

##       name    type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT            1    Phonetic;

alternatively a summary of the emuDB also gives us this as well as additional information:

## Name:     fromScratchDB 
## UUID:     7e4e80ec-092e-4785-97ce-3226cfc8a361 
## Directory:    /private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/RtmpUNaIbs/fromScratchDB_emuDB 
## Session count: 0 
## Bundle count: 0 
## Annotation item count:  0 
## Label count:  0 
## Link count:  0 
## Database configuration:
## SSFF track definitions:
## Level definitions:
##       name    type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT            1    Phonetic;
## Link definitions:

Let us add a further level definition that will contain the orthographic word transcriptions for the words uttered in our recordings. This level will be of the type “ITEM” meaning that elements contained within the level are sequentially ordered but do not contain any time information:

##       name    type nrOfAttrDefs attrDefNames
## 1 Phonetic SEGMENT            1    Phonetic;
## 2     Word    ITEM            1        Word;

Finally we could remove one of the level definitions with the function remove_levelDefinition(), which we will once again not invoke here as we still wish to use these level definitions.

NOTE: If there are actual instances of annotation items (“SEGMENT”s, “EVENT”s or “ITEM”s) present in the emuDB it will not be possible to remove the level definition. These items would have to be removed first.

Attribute definitions

Each level definition can contain multiple attributes, the most common and currently only supported attribute being a label ("type": "STRING"). Thus it is possible to have multiple parallel labels in a single level. This means that a single annotation item instance can contain multiple labels while sharing other properties such as the start and duration information. This can be quite useful when modeling certain types of data. A illustrative example of this would be the ‘Phonetic’ level created above. It is often the case that databases contain both the phonetic transcript using IPA UTF-8 symbols as well as using the Speech Assessment Methods Phonetic Alphabet (SAMPA). This is a perfect choice for using multiple attribute definitions within a single level:

##       name    level   type hasLabelGroups hasLegalLabels
## 1 Phonetic Phonetic STRING          FALSE          FALSE

Even though we have not added a single attribute definition to the ‘Phonetic’ level definition, it already contains the obligatory attribute definition that has the same name as it’s level. This indicates that it is the primary attribute of that level. To follow the above example let us now add a further attribute definition to the level definition that will contain the SAMPA versions of our annotations.

##       name    level   type hasLabelGroups hasLegalLabels
## 1 Phonetic Phonetic STRING          FALSE          FALSE
## 2    SAMPA Phonetic STRING          FALSE          FALSE

Label groups

A further optional field is the labelGroups field. It contains specifications of groups of labels that can be referenced by a name given to the group while querying the emuDB. Say we wish to reference all the long vowels in our Phonetic attribute definition with the name ‘long’ and all our short vowels with the name ‘short’. Let us now update our emuDB to contain these label groups:

##    name  values
## 1  long  iː; uː
## 2 short i; u; ə

NOTE: It is also possible to define label groups for the entire DB. For more information on this see the R documentation for the add/list/remove_labelGroups functions.

INFO: For users who are familiar with or transitioning from the legacy EMU system the label groups correspond to the unfavorably named ‘Legal Labels’ entries of the GTemplate Editor (i.e. legal entries in the .tpl file) of the legacy system. In the new system the legalLabel entries specify the legal / allowed labels values of an attribute definitions while the label groups specify groups of labels that can be referenced by the names given to the groups while performing queries.

File handling

Up until now we have defined the structure of our database. An essential part that is missing is of course the recordings that we wish to analyze. To import audio files, referred to as media files in the context of an emuDB, into the database one simply has to do the following:

##              name
## 1 filesFromWrassp
##           session   name
## 1 filesFromWrassp lbo001
## 2 filesFromWrassp lbo002
## 3 filesFromWrassp lbo003
## 4 filesFromWrassp lbo004
## 5 filesFromWrassp lbo005
## 6 filesFromWrassp lbo006
## 7 filesFromWrassp lbo007
## 8 filesFromWrassp lbo008
## 9 filesFromWrassp lbo009

We have now added a new session called ‘filesFromWrassp’ to the ‘fromScratchDB’ containing a new bundle for each of our imported media files. These bundles adhere to the structure we have specified above. Note however that the levels in the annotation files (_annot.json) that were created during the import are still empty. These will have to be created manually at a later stage using the EMU-webApp. To list the files that are part of the emuDB call:

## # A tibble: 6 x 4
##   session     bundle file        absolute_file_path                        
##   <chr>       <chr>  <chr>       <chr>                                     
## 1 filesFromW… lbo001 lbo001.wav  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 2 filesFromW… lbo001 lbo001_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 3 filesFromW… lbo002 lbo002.wav  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 4 filesFromW… lbo002 lbo002_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 5 filesFromW… lbo003 lbo003.wav  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 6 filesFromW… lbo003 lbo003_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…

The emuR package also provides a mechanism for adding files to preexisting bundle folders as this can be quite tedious to perform manually due to the nested folder structure of an emuDB. Let us create a set of files that contain the zero-crossing-rate values of the wav files we added above and for the sake of demonstration save them to a different location to then re-add them to the database.

## # A tibble: 6 x 4
##   session     bundle file        absolute_file_path                        
##   <chr>       <chr>  <chr>       <chr>                                     
## 1 filesFromW… lbo001 lbo001.wav  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 2 filesFromW… lbo001 lbo001.zcr  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 3 filesFromW… lbo001 lbo001_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 4 filesFromW… lbo002 lbo002.wav  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 5 filesFromW… lbo002 lbo002.zcr  /private/var/folders/yk/8z9tn7kx6hbcg_9n4…
## 6 filesFromW… lbo002 lbo002_ann… /private/var/folders/yk/8z9tn7kx6hbcg_9n4…

SSFF track definitions

A further important structural element of any emuDB are the so called ssffTracks (often simply referred to as tracks). These ssffTracks reference data that is stored in the Simple Signal File Format (SSFF) in the according _bndl folders. The two main types of data are:

Let us now add an ssffTrackDefinition to our database and calculate the SSFF files at the same time:

INFO: to see the fileExtension and columnName defaults produced by the various signal processing functions of the wrassp package see ?wrasspOutputInfos. For a list of all the available signal processing functions that the wrassp package provides see ?wrassp.

As you might have noticed the .zcr files we added in the previous section are listed as being part of the bundles but have no ssffTrackDefinition associated with them. Let’s fix that by adding another ssffTrackDefinition to the database:

##            name columnName fileExtension
## 1 formantValues         fm           fms
## 2  zeroCrossing        zcr           zcr

INFO: as the get_trackdata() function can perform signal processing functions and calculates all necessary values in real time, it is seldom necessary to define ssffTracks for tracks produced by the wrassp package. For complementary data as well as data that has to be manipulated manually (e.g. manual formant corrections) this is still a feasible and necessary option. Also, if you wish to display SSFF data in the EMU-webApp it is necessary to pre-calculate the ssffTracks as the web application can not perform real-time calculations.

Note also that there are currently two special ssffTrackDefinitions. They are special in the sense that if they have either the name “FORMANTS” or the name “EPG” the EMU-webApp will expect the according SSFF files to be formated in a specific way and will also display them differently to the other tracks. If the track is named “FORMANTS” and this track is assigned to be overlayed on the spectrogram the EMU-webApp will frequency align the formant contours to the spectrogram and will permit these contours to be manually corrected. If the track is called “EPG” and the EMU-webApp is configured to display this track in the twoDimCanvases it will display an EPG plot of the data (see the File descriptions section of this document for more information on twoDimCanvases).

Configuring the EMU-webApp

Before we can start manually annotating our speech database we have to configure our ‘fromScratchDB’ to contain information about how the database is to be displayed by the EMU-webApp. The EMU-webApp subdivides different ways to look at an emuDB into so called perspectives. These perspectives, between which you can switch in the web application, contain information on what levels are displayed, which ssffTracks are drawn, and so on. Let us list the current perspectives of our database:

##      name signalCanvasesOrder levelCanvasesOrder
## 1 default          OSCI; SPEC

As you can see there is already a perspective available called ‘default’. This perspective was automatically added to the emuDB during the import of our mediaFiles. It currently only displays the oscillogram (“OSCI”) followed by the spectrogram (“SPEC”). “OSCI” and “SPEC” can be viewed as predefined tracks that are always available to the EMU-webApp. Using the add/remove_perspective() functions we could now add and remove as many additional perspectives to the database as we like. For now we will maintain the ‘default’ perspective and add the order in which we would like to display our levels.

## [1] "Phonetic"

As you can see we only added the “Phonetic” and not the “Word” level to be displayed in the “default” perspective as only levels of the type “SEGMENT” or “EVENT” are allowed to be displayed. All “ITEM” levels can be viewed by clicking the “showHierarchy” button in the top menu bar of the EMU-webApp and choosing an appropriate path through the hierarchy.

As the final configuration step let us also add the ssffTracks we defined and calculated above to the “default” perspective:

## [1] "OSCI" "SPEC"
## [1] "OSCI"          "SPEC"          "formantValues" "zeroCrossing"

We have now completed the configuration of the ‘fromScratchDB’ emuDB. By calling the function serve(dbName) we can now start a server in our R session and connect the EMU-webApp to our database to visualize and annotate the emuDB.

INFO: the EMU-webApp is highly configurable and only a small subset of the configuration options are available through the emuR package. More complex visualization configurations can be achieved by manually editing the _DBconfig.json file and reloading the database. For a comprehensive list of all the available fields in the _DBconfig.json and their meanings see the File descriptions section of this document.


Autobuilding is a process that lets the emuDB maintainer semi-automatically build hierarchical structures from preexisting annotations by linking annotational units together. To have some preexisting annotations to play with, let us convert a TextGridCollection and load the newly created emuDB into the current R session.

# create demo data in folder provided by the tempdir() function
create_emuRdemoData(dir = tempdir())
# get the path to a folder containing .wav & .TextGrid files that is part of the demo data
path2folder = file.path(tempdir(), "emuR_demoData", "TextGrid_collection")
# convert this TextGridCollection to the emuDB format
convert_TextGridCollection(path2folder, dbName = "myTGcolDB", 
                           targetDir = tempdir(), verbose = F)
# load database
dbHandle = load_emuDB(file.path(tempdir(), "myTGcolDB_emuDB"), verbose = F)

By inspecting the emuDB we can see that it has eleven levelDefinitions but no linkDefinitions. This means that it will not be possible to perform hierarchical queries on this emuDB, as there is no explicit hierarchical information in the database.

# list levels
##            name    type nrOfAttrDefs  attrDefNames
## 1     Utterance SEGMENT            1    Utterance;
## 2  Intonational SEGMENT            1 Intonational;
## 3  Intermediate SEGMENT            1 Intermediate;
## 4          Word SEGMENT            1         Word;
## 5        Accent SEGMENT            1       Accent;
## 6          Text SEGMENT            1         Text;
## 7      Syllable SEGMENT            1     Syllable;
## 8       Phoneme SEGMENT            1      Phoneme;
## 9      Phonetic SEGMENT            1     Phonetic;
## 10         Tone   EVENT            1         Tone;
## 11         Foot SEGMENT            1         Foot;
# list ssffTracks


As it is a very laborious task to manually link ITEMs together using the EMU-webApp and the hierarchical information is already implicitly contained in the time information of the SEGMENTs and EVENTs of each level (see figure below), the emuR package provides a function to build these hierarchical structures from this information.

Example annotation structure after convert_TextGridCollection()

Example annotation structure after convert_TextGridCollection()

For the sake of brevity let’s focus on just three of the eleven levels. We will use the autobuild_linkFromTimes() function to build the following hierarchical structure:

Hierarchical structure to be produced by autobuild_linkFromTimes()

Hierarchical structure to be produced by autobuild_linkFromTimes()

The convertSuperlevel argument of the autobuild_linkFromTimes() function, that we will set to TRUE in the example below, tells the function to convert the super level to a level of type ITEM. As this is a very risky procedure as all the time information will be deleted from the “Syllable” level, the function automatically creates a backup of the level called “Syllable-autobuildBackup”. Before we can invoke the autobuild function we must however first add a linkDefinition to our emuDB that specifies the type of relationship that our level have:

##          type superlevelName sublevelName
## 1 ONE_TO_MANY       Syllable      Phoneme
##                        name    type nrOfAttrDefs              attrDefNames
## 1                 Utterance SEGMENT            1                Utterance;
## 2              Intonational SEGMENT            1             Intonational;
## 3              Intermediate SEGMENT            1             Intermediate;
## 4                      Word SEGMENT            1                     Word;
## 5                    Accent SEGMENT            1                   Accent;
## 6                      Text SEGMENT            1                     Text;
## 7                  Syllable    ITEM            1                 Syllable;
## 8                   Phoneme SEGMENT            1                  Phoneme;
## 9                  Phonetic SEGMENT            1                 Phonetic;
## 10                     Tone   EVENT            1                     Tone;
## 11                     Foot SEGMENT            1                     Foot;
## 12 Syllable-autobuildBackup SEGMENT            1 Syllable-autobuildBackup;

As we can see we have now converted the original “Syllable” level to the type ITEM and the backup level was added to the emuDB. Let us now perform the same procedure for the “Phoneme” and “Phonetic” levels:

##           type superlevelName sublevelName
## 1  ONE_TO_MANY       Syllable      Phoneme
## 2 MANY_TO_MANY        Phoneme     Phonetic
##                        name    type nrOfAttrDefs              attrDefNames
## 1                 Utterance SEGMENT            1                Utterance;
## 2              Intonational SEGMENT            1             Intonational;
## 3              Intermediate SEGMENT            1             Intermediate;
## 4                      Word SEGMENT            1                     Word;
## 5                    Accent SEGMENT            1                   Accent;
## 6                      Text SEGMENT            1                     Text;
## 7                  Syllable    ITEM            1                 Syllable;
## 8                   Phoneme    ITEM            1                  Phoneme;
## 9                  Phonetic SEGMENT            1                 Phonetic;
## 10                     Tone   EVENT            1                     Tone;
## 11                     Foot SEGMENT            1                     Foot;
## 12 Syllable-autobuildBackup SEGMENT            1 Syllable-autobuildBackup;
## 13  Phoneme-autobuildBackup SEGMENT            1  Phoneme-autobuildBackup;

This time we chose to add a linkDefinition of the type MANY_TO_MANY between the two levels. This is due to the fact that reduction processes can cause multiple phonemes can be produced as a single phone and due to insertion processes a single phoneme can be produced as multiple phones. We have now created the above hierarchical structure that we where aiming for.

File descriptions


The DBconfig file, as mentioned above, contains the configuration options of the database. People familiar with the legacy EMU system will recognize this as the replacement file for the legacy template (.tpl) file. By convention variables / strings written entirely in capital letters indicate a constant variable that usually has a special meaning. This is also the case with strings like this found in the DBconfig ("STRING","ITEM","SEGMENT", "EVENT", "OSCI", … ).

The _DBconfig.json file contains the following fields:


The _annot.json files contain the the actual annotation information as well as the hierarchical linking information. Legacy EMU users should note that all the information that used to be split into several ESPS/waves+ label files as well as a .hlb file is now contained in this single file.

The _annot.json file contains the following fields: