Demonstration for creating continuous norms with cNORM

Wolfgang Lenhard, Alexandra Lenhard


cNORM is a package for the R environment for statistical computing that aims at generating continuous test norms in psychometrics and biometrics and to analyze the model fit. It is based on the approach of A. Lenhard, Lenhard, Suggate and Segerer (2016).

The method stems from psychometric test construction and was developed to create continuous norms for age or grade in performance assessment (e. g. vocabulary development, A. Lenhard, Lenhard, Segerer & Suggate, 2015; reading and writing development, W. Lenhard, Lenhard & Schneider, 2017). It can however be applied wherever test data like psychological (e. g. intelligence), physiological (e. g. weight) or other measures are dependent on continuous (e.g., age) or discrete (e.g., sex or test mode) explanatory variables.

The package estimates percentile curves in dependence of the explanatory variable (e. g. schooling duration, age …) via Taylor polynomials, thus offering several advantages:

In the following sections, we demonstrate the necessary steps for the application of the R package with real human performance data, namely, with the standardization sample of the sentence comprehension subtest of ELFE 1-6, a reading comprehension test in German language (W. Lenhard & Schneider, 2006).

1. Data Preparation

If a sufficiently large and representative sample has been established (missings should be excluded), then the data must first be imported. It is advisable to start with a simply structured data object of type data.frame, which only contains numeric variables without value labels. It is as well favorable to label the measured raw scores with the variable name ‘raw’, as this is the default specification in cNORM. However, all variable names can also be defined individually, but must then be specified as function parameters. The explanatory variable in psychometric performance tests is usually age. We therefore refer to this variable as ‘age’. In fact, however, the explanatory variable is not necessarily age. A training or schooling duration or other explanatory variables can also be included in the modeling. However, it must be an interval-scaled (or, as the case may be, dichotomous) variable. Finally, a grouping variable is required to divide the explanatory variable into smaller standardization groups (e.g. grades or age groups). The method is relatively robust against changes in the granularity of the group subdivision. For example, the result of the standardization only marginally depends on whether one chooses half-year or full-year gradations (see A. Lenhard, Lenhard, Suggate & Segerer, 2016). The more the variable to be measured covaries with the explanatory variable (e. g. a fast development over age in an intelligence test), the more groups should be formed beforehand to capture the trajectories adequately. By standard, we assign the variable name “group” to the grouping variable.

Recoding Continuous Variables

If, when using cNORM, you initially only have the continuous age variable available, you must recode it into a discrete grouping variable. The following code could be helpful (another possibility is the’rankBySlidingWindow’ function described below):

 # Creates a grouping variable for a fictitious age variable
# for children age 2 to 18. That way, the age variable is recoded
# into a discrete group variable, each group comprising a year.

data$group <- c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
   [findInterval(data$age, c(-Inf, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
   13, 14, 15, 16, 17))]

Of course, it is also possible to use a data set for which standard scores already exist for individual age groups.

Sample Data

A continuously distributed age variable is not necessary in this case. For demostration purposes, cNORM includes a cleaned data set from a German test standardization (ELFE 1-6, W. Lenhard & Schneider, 2006, subtest sentence comprehension) that will be used for demonstrating the method. Another large (but unrepresentative) data set for demonstration purposes stems from the adaption of a vocabulary test to the German language (PPVT-4, A. Lenhard, Lenhard, Segerer & Suggate, 2015). For biometric modeling, it includes a large CDC dataset (N > 45,000) for growth curves from age 2 to 25 (weight, height, BMI; CDC, 2012) and for macro economical and sociological data the data on mortality and life expectancy at birth from 1960 to 2017 from the World Bank. You can retrieve information on the data by typing ?elfe, ?ppvt, ?CDC, ?life or ?mortality on the R console. To load the data sets, please use the following code:

# Loads the package cNorm

# Copies the data set "elfe" from the environment into the object 'normData'
normData <- elfe

# Or similarly for the "ppvt"
normData <- ppvt

# And finally the data for the Body Mass Index
# Please specify 'bmi' as the 'raw' variable in this case in later analyses
normData <- CDC

# Displays the first lines of the data

Data Exploration

As you can see, there is no age variable in the data set ‘elfe’, only a person ID, a raw score and a grouping variable. In this case, the grouping variable also serves as a continuous explanatory variable, since children were only examined at the very beginning and in the exact middle of the school year during the test standardization. For example, the value 2.0 means that the children were at the beginning of the second school year, the value 2.5 means that the children were examined in the middle of the second school year. Another possibility would have been to examine children throughout the entire school year. In this case, the duration of schooling would have to be entered as a continuous explanatory variable. To build the grouping variable, the first and second half of each school year could, for exampe, be aggregated into one group respectively. In the ‘elfe’ data set there are seven groups with 200 cases each, i.e. a total of 1400 cases. With the help of the psych package, descriptive data can be displayed in groups if desired (optional):

# Install psych package
install.packages("psych", dependencies = TRUE)

# Display descriptive results
describeBy(normData, group="group")


The next step is to rank each person in each group using the rankByGroup function. The function returns percentiles and also performs a normal-rank transformation in which T-Scores (M = 50, SD = 10) are returned by default. In principle, our mathematical method also works without normal rank transformation, i.e., the method could theoretically also be carried out with the percentiles. This is useful, for example, if you want to enter a variable that deviates extremely from the normal distribution or follows a completely different distribution. For most psychological or physical scales, however, the distributions are still sufficiently similar to the normal distribution even with strong bottom and ceiling effects. In these cases, the normal-rank transformation usually increases the model fit and facilitates the further processing of the data. In addition to T-Scores, the standard scores can also be expressed as z- or IQ-Scores. You can also choose between different ranking methods (RankIt, Blom, van der Warden, Tukey, Levenbach, Filliben, Yu & Huang). However, we want to stick to T-Scores and RankIt, which are preset by default:

# Start up cNORM
#> Good morning star-shine, cNORM says 'Hello!'
# Determine percentiles by group
normData <- rankByGroup(elfe, group = "group") 

To change the ranking method, please specify a method index with method = x (x = method index; see function help). The standard score can be specified as T-Score, IQ-Score, z-Score or by means of a double vector of M and SD, e.g. scale = c(10, 3) for Wechsler subtest scaled scores. The grouping variable can be deactivated by setting group = FALSE. The normal-rank transformation is then applied to the entire sample.

Please note that there is another function for determining the rank, which works without discrete grouping variables. The rank of each individual subject is then estimated based on the continuous explanatory variables using a sliding window. The width of this window can be specified individually. In the case of a continuous age variable, the specification width = 0.5 means, for example, that the width of the window is half a year. As a consequence, the rank of a test persons is based on all participants who are no more than 3 months younger or older than the test person in question, i. e., the group comprises a total of 6 months.

# Percentile generation by a sliding window of the size 'width'
normData2 <- rankBySlidingWindow(data = elfe, age = "group", raw = "raw", width = 0.5)

Please note that the ‘rankBySlidingWindow’ function only makes sense if the age variable is actually continuous. In the ‘elfe’ data set the variable ‘group’ serves as continuous explanatory variable as well as diskrete grouping variable. Therefore, with the function ‘rankBySlidingWindow’ we obtain the same standard scores as with the function ‘rankByGroup’ in this specific case.

Both ranking functions (‘rankBySlidingWindow’ and’rankByGroup’) add two additional columns, namely ‘percentile’ and ‘normValue’. In addition, descriptive information about each group is added, namely n, m, md and sd. Descriptive results are only necessary under certain circumstances. The creation of these variables can be deactivated via the parameter ‘descriptives’.

Computing powers and interactions

At this point, where many test developers already stop standardization, the actual modeling process begins. A function is determined which expresses the raw score as a function of the latent person parameter l and the explanatory variable. In the following, we will refer to the latter variable as ‘a’. In the ‘elfe’ example, we use the discrete variable ‘group’ for a. If there is an additional continuous age variable, it should be used instead as ‘a’ because of its higher precision.

To retrieve the mathematical model, all powers of the variables ‘l’ and ‘a’ up to a certain exponent k must be computed. Subsequently, all interactions between these powers must also be calculated by simple multiplication. As a rule of thumb, k > 5 leads to over-adjustment. In general, k = 4 or even k = 3 will already be sufficient to model human performance data with adequate precision. Please use the following function for the calculation:

# Calculation of powers and interactions up to k = 4
normData <- computePowers(normData, k = 4, norm = "normValue", age = "group") 

The data set now has 24 new variables ( \(2*k + k^{2}\) ), namely L1, L2, L3, L4 (powers of the norm value), A1, A2, A3, A4 (powers of the grouping variable) and the linear combinations L1A1, L2A1L4A3, L4A4.

2. Modeling

We now want to find a regression model that models the original data as closely as possible with as few predictors as possible. We however want to smooth out noise from the original norm data, which can be due to the random sampling process or violations of representativeness. This is done through the ‘bestModel’ function. You can use this function in two different ways: If you specify \(R_{adjusted}^{2}\), then the regression function will be selected that meets this requirement with the smallest number of predictors. You can however also specify a fixed number of predictors. Then the model is selected that achieves the highest \(R_{adjusted}^{2}\) with this specification. To select the best model, cNORM uses the ‘regsubset’ function from the ‘leaps’ package. As we do not know beforehand, how well the data can be modeled, we start with the default values (k = 4 and \(R_{adjusted}^{2}\) = .99):

model <- bestModel(normData)
#> Final solution: 3 terms
#> R-Square Adj. amounts to 0.990838
#> Final regression model: raw ~ L3 + L1A1 + L3A3
#> Regression function: raw ~ -11.39665566 + (2.08886395e-05*L3) + (0.1649386974*L1A1) + (-5.892663055e-07*L3A3)
#> Use 'printSubset(model)' to get detailed information on the different solutions, 'plotSubset(model)' to inspect model fit and 'summary(model)' for statistics on the regression model.

Fine! The determined model already exceeds the predefined threshold of \(R_{adjusted}^{2}\) = .99 with only three predictors (plus intercept). The ‘bestModel’ function as well returns the coefficients and the complete regression formula, which - as was specified - captures more than 99% of the variance in the data set.

If you want to have a look at the selection procedure, all the information is available in ‘model$subsets’. The variable selection process per step is listed in ‘outmat’ and ‘which’. There, you can find the \(R^{2}\), \(R_{adjusted}^{2}\), \(C_p\) and \(BIC\):

#>           R2       RSS     R2adj          Cp       BIC
#> 1  0.9198099 5736.1349 0.9197525 22283.51634 -3518.209
#> 2  0.9800861 1424.4767 0.9800576  4486.42639 -5461.138
#> 3  0.9908572  653.9973 0.9908376  1307.78653 -6543.733
#> 4  0.9914362  612.5836 0.9914116  1138.82531 -6628.074
#> 5  0.9918502  582.9677 0.9918210  1018.56720 -6690.205
#> 6  0.9922628  553.4550 0.9922295   898.73484 -6755.692
#> 7  0.9930104  499.9771 0.9929753   679.97079 -6890.714
#> 8  0.9933816  473.4251 0.9933435   572.36068 -6959.866
#> 9  0.9937628  446.1598 0.9937224   461.80602 -7035.664
#> 10 0.9940088  428.5575 0.9939657   391.14150 -7084.773
#> 11 0.9940329  426.8367 0.9939856   386.03790 -7083.162
#> 12 0.9941718  416.8980 0.9941214   347.00947 -7108.902
#> 13 0.9944014  400.4753 0.9943489   281.21434 -7157.923
#> 14 0.9944769  395.0776 0.9944211   260.93194 -7169.676
#> 15 0.9948578  367.8304 0.9948021   150.45225 -7262.476
#> 16 0.9950613  353.2764 0.9950041    92.37112 -7311.752
#> 17 0.9952330  340.9906 0.9951744    43.65367 -7354.062
#> 18 0.9952645  338.7359 0.9952028    36.34605 -7356.105
#> 19 0.9952819  337.4957 0.9952169    33.22636 -7353.996
#> 20 0.9953180  334.9107 0.9952501    24.55519 -7357.516
#> 21 0.9953242  334.4658 0.9952530    24.71882 -7352.133
#> 22 0.9953292  334.1116 0.9952546    25.25632 -7346.372
#> 23 0.9953371  333.5465 0.9952591    24.92361 -7341.498
#> 24 0.9953436  333.0805 0.9952623    25.00000 -7336.211

Furthermore, information about the change of Radjusted and other information criteria (Mallow’s Cp or BIC) depending on the number of predictors (with fixed k) can also be graphically inspected. Please use the following command to do this:

plotSubset(model, type = 0) 
#> Hint: Select the model with the highest BIC or Cp score while simultaneously optimizing R2.

The figure displays Radjusted2 as a function of the number of predictors by default. Alternatively, you can also plot log-transformed Mallow’s \(C_p\) (type = 1) or \(BIC\) (type = 2) as a function of \(R_{adjusted}^{2}\).

plotSubset(model, type = 1) 
#> Hint: Select the model with the highest BIC or Cp score while simultaneously optimizing R2.

The figure shows that the default value of \(R_{adjusted}^{2}\) = .99 is already achieved with only three predictors. The inclusion of further predictors only leads to small increases of Radjusted or to small decreases of Mallow’s \(C_p\). Where the dots are close together, the inclusion of further predictors is of little use. To avoid over-fitting, a model with as few predictors as possible should therefore be selected from this area.

The model with three predictors seems to be suitable. Nevertheless, the model found in this way must still be tested for plausibility using the means described in Model Validation. Above all, it is necessary to determine the limits of model validity. If a model turns out to be suboptimal after this model check, \(R_{adjusted}^{2}\), the number of predictors or, if necessary, k should be chosen differently.

3. Model Validation

From a mathematical point of view, the regression function represents a so-called hyperplane in three-dimensional space. If \(R^2\) is sufficiently high (e.g. \(R^2 > .99\)), this plane usually models the manifest data over wide ranges of the standardization sample very well. However, a Taylor polynomial, as used here, usually has a finite radius of convergence. This means that there are age or performance ranges for which the regression function no longer provides plausible values. With high R2, these limits of model validity are only reached at the outer edges of the age or performance range of the standardization sample or even beyond. Please note that such model limits occur not only because the method is not omnipotent, but also because the underlying test scales have only a limited validity range within which they can reliably map a latent ability to a meaningful numerical test score. In other words, the limits of model validity often show up at those points where the test has too strong floor or ceiling effects or where the standardization sample is too diluted.

Of course, norm tables and normal scores should generally only be issued within the validity range of the model and the test. It is therefore essential to determine the limits of model validity when applying cNORM (or any other procedure used to model normal scores). For this purpose, cNORM mainly provides graphical methods, which we present to you on this page. At this point, however, we would like to point out to the mathematically experienced users that it is also possible to approach the topic analytically. Since the regression equation is a polynomial of the nth degree that is very easy to handle from a mathematical point of view, it can be subjected to a conventional curve sketching. This makes it very easy to determine, for example, where extremes, turning points, saddle points, etc. occur or where the gradient has implausible values.

Below you will find three functions for checking the model fit graphically and making the limits of the model visible:


The following figure shows how well the model generally fits the manifest data:

# Plots the fitted and the manifest percentiles

plot <- plotPercentiles(normData, model)

In the figure, the range of raw scores was automatically determined by the values in the original dataset, but it can be manually specified by ‘minRaw = 0’ and ‘maxRaw = 28’ (range of scores in this particular test) as well. As the figure shows, the predicted percentiles run smoothly across all levels of the explanatory variable and are in good agreement with the original data. Small fluctuations between the individual groups are eliminated. It is important to ensure that the percentile lines do not intersect, since this would mean that different values of the latent person variable are assigned to one single raw score. The mapping of latent person variables to raw scores would no longer be biunique (=bijective) at this point, e.g. it would not be possible to distinguish between these different values of the latent variable by means of the test score. As already described above, intersecting percentiles predominantly occur when the regression model is extended to age or performance ranges that do not or only rarely occur in the standardization sample, or when the test shows strong floor or ceiling effects.

If you are not sure yet, which model to choose, you can display a series of plots via the plotPercentileSeries function with an increasing number of predictors, for example:

# Displays a series of plots of the fitted and the manifest percentiles
plotPercentileSeries(normData, model)


In the next figure, the fitted and the manifest data are compared separately for each (age) group:

plotRaw(normData, model, group="group")

The adjustment is particularly good if all points are as close as possible to the bisecting line. However, it must be noted that deviations in the extremely upper, but particularly in the extremely lower performance range often occur because the manifest data in these areas are also associated with low reliability.


This function does the same as plotRaw, but instead uses norm values. Please note, that it might take some time depending on the sample size:

plotNorm(normData, model, group="group", minNorm = 25, maxNorm = 75)


To check whether the mapping between latent person variables and test scores is biunique, the regression function can be searched numerically within each group for bijectivity violations using the ‘checkConsistency’ function. In addition, it is also possible to plot the first partial derivative of the regression function to l and search for negative values. This can be done in the following way:

plotDerivative(model, minAge=1, maxAge=6, minNorm=20, maxNorm=80)
#> [1] "Horizontal and vertical extrapolation detected. Be careful using age groups and extreme norm scores outside the original sample.\nThe original data for the regression model spanned from age 2 to 5, with a norm score range from 21.93 to 78.07."

In this figure, we have extended both the age and the performance range beyond the limits of the standardization sample in order to better represent and check the limits of the model validity. (Please remember that the age variable in this norming sample comprises the values 2 to 5 and that 200 children per age group were examined.) As you can see, the first partial derivative of the regression function to l is only negative in the upper age and performance range. This does not mean that the modeling has failed, but that the test scale loses its ability to differentiate in this measuring range.

When, at the end of the modeling process, norm tables are generated, the identified limits of model validity must be respected. Or to put it in other words: Normal scores should only be issued for the valid ranges of the model.

4. Generating Norm Tables

In addition to the pure modeling functions, cNORM also contains functions for generating norm tables, retrieving the normal score for a specific raw score and vice versa or for the visualization of norm curves. These functions are described below.


The first function ‘getNormCurve’ returns the fitted raw scores for a certain normal score (e.g., T = 50) across different age groups. The parameter ‘step’ specifies the distance between two age groups. If no further specifications are made, the output is limited to actually occurring raw values and age groups.

getNormCurve(50, model, minAge = 2, maxAge = 5, step = 0.25, minRaw = 0, maxRaw = 28)


The ‘plotNormCurves’ function plots the fitted raw scores for pre-specified normal scores (e.g., T = 30, 40, 50, 60, 70) across age.

plotNormCurves(model, normList = c(30, 40, 50, 60, 70), minAge = 2, maxAge = 5, step = 0.1, minRaw = 0, maxRaw = 28)


The ‘predictNorm’ function returns the normal score for a specific raw score (e.g., raw = 15) and a specific age (e.g., a = 4.7). The normal scores can be limited to a minimum and maximum value in order to take into account the limits of model validity.

predictNorm(15, 4.7, model, minNorm = 25, maxNorm = 75)
#> [1] 36.59881


The ‘predictRaw’ function returns the predicted raw score for a specific normal score (e.g., T = 55) and a specific age (e.g., a = 4.5).

predictRaw(55, 4.5, model$coefficients, minRaw = 0, maxRaw = 28)
#> [1] 23.9672


The ‘normTable’ function returns the corresponding raw scores for a specific age (e.g., a = 3) and a pre-specified series of normal scores. The parameter ‘step’ specifies the distance between two normal scores.

# Generate norm table for grade 3
normTable(3, model, minRaw = 0, maxRaw = 28, minNorm=30.5, maxNorm=69.5, step = 1, descend = FALSE)
#>    norm       raw
#> 1  30.5  3.836487
#> 2  31.5  4.345657
#> 3  32.5  4.855768
#> 4  33.5  5.366850
#> 5  34.5  5.878933
#> 6  35.5  6.392046
#> 7  36.5  6.906219
#> 8  37.5  7.421483
#> 9  38.5  7.937867
#> 10 39.5  8.455401
#> 11 40.5  8.974115
#> 12 41.5  9.494039
#> 13 42.5 10.015202
#> 14 43.5 10.537635
#> 15 44.5 11.061367
#> 16 45.5 11.586428
#> 17 46.5 12.112849
#> 18 47.5 12.640658
#> 19 48.5 13.169887
#> 20 49.5 13.700564
#> 21 50.5 14.232720
#> 22 51.5 14.766384
#> 23 52.5 15.301586
#> 24 53.5 15.838357
#> 25 54.5 16.376726
#> 26 55.5 16.916723
#> 27 56.5 17.458377
#> 28 57.5 18.001720
#> 29 58.5 18.546779
#> 30 59.5 19.093587
#> 31 60.5 19.642171
#> 32 61.5 20.192563
#> 33 62.5 20.744792
#> 34 63.5 21.298888
#> 35 64.5 21.854880
#> 36 65.5 22.412799
#> 37 66.5 22.972675
#> 38 67.5 23.534537
#> 39 68.5 24.098415
#> 40 69.5 24.664340

Norm tables, in which the raw score or the range of raw scores is given for a certain normal score, are usually needed if one has several test scales, all of which are to be converted into the same type of normal scale. Please note that as a test designer you cannot use this function directly to generate norm tables. Instead you have to convert the table to a suitable form first. Remember that raw scores are usually integers. Therefore, the normal score series should not start at an integer value, but e.g. at 30.5. If a step size of 1 is selected, normal score intervals with integers as interval centers are generated. In the example shown, the T-score interval [30.5; 31.5] contains only one single integer raw score, namely 4, which would therefore be assigned to the T-score 31. A whole range of raw scores (or no raw score at all) can thus be assigned to a particular integer normal score.


The function ‘rawTable’ is similar to ‘normTable’, but reverses the assignment: The normal scores are assigned to a pre-specified series of raw scores at a certain age. This requires an inversion of the regression function, which is determined numerically.

# Generate raw table for grade 3.5
rawTable(3.5, model, minRaw = 0, maxRaw = 28, minNorm = 25, maxNorm = 75, step = 1, descend = FALSE)
#>    raw     norm
#> 1    0 25.00000
#> 2    1 25.00000
#> 3    2 25.00000
#> 4    3 25.05781
#> 5    4 26.81698
#> 6    5 28.58000
#> 7    6 30.34714
#> 8    7 32.11870
#> 9    8 33.89496
#> 10   9 35.67623
#> 11  10 37.46283
#> 12  11 39.25505
#> 13  12 41.05325
#> 14  13 42.85774
#> 15  14 44.66888
#> 16  15 46.48703
#> 17  16 48.31256
#> 18  17 50.14586
#> 19  18 51.98733
#> 20  19 53.83738
#> 21  20 55.69645
#> 22  21 57.56498
#> 23  22 59.44346
#> 24  23 61.33237
#> 25  24 63.23223
#> 26  25 65.14358
#> 27  26 67.06698
#> 28  27 69.00304
#> 29  28 70.95239

You need these kind of tables if you want to determine the exact percentile or the exact normal score for all occurring raw scores. With higher precision and smaller increments, this function becomes computationally intensive.


Please visit for further examples with the inbuilt data sets, especially with respect to continuous explanatory variables and non standard application cases.