This rather long vignette is to explain the development and testing of nlsr, an R package to try to bring the R function nls() up to date and to add capabilities for the extension of the symbolic and automatic derivative tools in R.
A particular goal in nlsr is to attempt, wherever possible, to use analytic or automatic derivatives. The function nls() uses a rather weak forward derivative approximation. A second objective is to use a Marquardt stabilization of the Gauss-Newton equations to avoid the commonly encountered “singular gradient” failure of nls(). This refers to the loss of rank of the Jacobian at the parameters for evaluation. The particular stabilization also incorporates a very simple trick to avoid very small diagonal elements of the Jacobian inner product, though in the present implementations, this is accomplished indirectly. See the section below Implementation of method
In preparing the nlsr package there is a sub-goal to unify, or at least render compatible, various packages in R for the estimation or analysis of problems amenable to nonlinear least squares solution.
A large part of the work for this package – particularly the parts concerning derivatives and R language structures – was initially carried out by Duncan Murdoch. Without his input, much of the capability of the package would not exist.
The package and this vignette are a work in progress, and assistance and examples are welcome. Note that there is similar work in the package Deriv (Andrew Clausen and Serguei Sokol (2015) Symbolic Differentiation, version 2.0, https://cran.r-project.org/package=Deriv), and making the current work “play nicely” with that package is desirable.
nlsr
extracts and displays the coefficients for a model estimated by nlxb() or nlfb() in the nlsr structured object.
library(nlsr)
ydat<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
75.995, 91.972)
tdat<-1:12
# try setting t and y here
t <- tdat
y <- ydat
sol1 <- nlxb(y~100*b1/(1+10*b2*exp(-0.1*b3*t)), start=c(b1=2, b2=5, b3=3))
## vn:[1] "y" "b1" "b2" "b3" "t"
## no weights
coef(sol1)
## b1 b2 b3
## 1.9619 4.9092 3.1357
## attr(,"pkgname")
## [1] "nlsr"
print(coef(sol1))
## b1 b2 b3
## 1.9619 4.9092 3.1357
## attr(,"pkgname")
## [1] "nlsr"
Functions Deriv and fnDeriv are designed as replacements for the stats package functions D and deriv respectively, though the argument lists do not match exactly. newDeriv allows additional analytic derivative expressions to be added. The following is an expanded and commented version of the examples from the manual for these functions.
require(nlsr)
newDeriv() # a call with no arguments returns a list of available functions
## [1] "^" "~" "-" "/" "(" "["
## [7] "*" "+" "abs" "acos" "asin" "atan"
## [13] "cos" "cosh" "digamma" "dnorm" "exp" "gamma"
## [19] "lgamma" "log" "pnorm" "psigamma" "sign" "sin"
## [25] "sinh" "sqrt" "tan" "trigamma"
# for which derivatives are currently defined
newDeriv(sin(x)) # a call with a function that is in the list of available derivatives
## $expr
## sin(x)
##
## $argnames
## [1] "x"
##
## $required
## [1] 1
##
## $deriv
## cos(x) * D(x)
# returns the derivative expression for that function
nlsDeriv(~ sin(x+y), "x") # partial derivative of this function w.r.t. "x"
## cos(x + y)
## CAUTION !! ##
newDeriv(joe(x)) # but an undefined function returns NULL
## NULL
newDeriv(joe(x), deriv=log(x^2)) # We can define derivatives, even if joe(x) is meanin
nlsDeriv(~ joe(x+z), "x")
## log((x + z)^2)
# Some examples of usage
f <- function(x) x^2
newDeriv(f(x), 2*x*D(x))
nlsDeriv(~ f(abs(x)), "x")
## 2 * abs(x) * sign(x)
nlsDeriv(~ x^2, "x")
## 2 * x
nlsDeriv(~ (abs(x)^2), "x")
## 2 * abs(x) * sign(x)
# derivatives of distribution functions
nlsDeriv(~ pnorm(x, sd=2, log = TRUE), "x")
## 0.5 * exp(dnorm(x/2, log = TRUE) - pnorm(x/2, log = TRUE))
# get evaluation code from a formula
codeDeriv(~ pnorm(x, sd = sd, log = TRUE), "x")
## {
## .expr1 <- x/sd
## .value <- pnorm(x, sd = sd, log = TRUE)
## .grad <- array(0, c(length(.value), 1L), list(NULL, "x"))
## .grad[, "x"] <- 1/sd * exp(dnorm(.expr1, log = TRUE) - pnorm(.expr1,
## log = TRUE))
## attr(.value, "gradient") <- .grad
## .value
## }
# wrap it in a function call
fnDeriv(~ pnorm(x, sd = sd, log = TRUE), "x")
## function (x, sd)
## {
## .expr1 <- x/sd
## .value <- pnorm(x, sd = sd, log = TRUE)
## .grad <- array(0, c(length(.value), 1L), list(NULL, "x"))
## .grad[, "x"] <- 1/sd * exp(dnorm(.expr1, log = TRUE) - pnorm(.expr1,
## log = TRUE))
## attr(.value, "gradient") <- .grad
## .value
## }
f <- fnDeriv(~ pnorm(x, sd = sd, log = TRUE), "x", args = alist(x =, sd = 2))
f
## function (x, sd = 2)
## {
## .expr1 <- x/sd
## .value <- pnorm(x, sd = sd, log = TRUE)
## .grad <- array(0, c(length(.value), 1L), list(NULL, "x"))
## .grad[, "x"] <- 1/sd * exp(dnorm(.expr1, log = TRUE) - pnorm(.expr1,
## log = TRUE))
## attr(.value, "gradient") <- .grad
## .value
## }
f(1)
## [1] -0.36895
## attr(,"gradient")
## x
## [1,] 0.25458
100*(f(1.01) - f(1)) # Should be close to the gradient
## [1] 0.25394
## attr(,"gradient")
## x
## [1,] 0.2533
# The attached gradient attribute (from f(1.01)) is
# meaningless after the subtraction.
These functions create functions to evaluate residuals or sums of squares at particular parameter locations.
weed <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
ii <- 1:12
wdf <- data.frame(weed, ii)
wmod <- weed ~ b1/(1+b2*exp(-b3*ii))
start1 <- c(b1=1, b2=1, b3=1)
wrj <- model2rjfun(wmod, start1, data=wdf)
print(wrj)
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x4d1a330>
weedux <- nlxb(wmod, start=c(b1=200, b2=50, b3=0.3))
## vn:[1] "weed" "b1" "b2" "b3" "ii"
## no weights
print(weedux)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 5 Jacobian and 6 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.167e-08 -1.334e-10 1011
## b2 49.0916 1.688 29.08 3.284e-10 -5.431e-11 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 3.865e-08 0.04714
wss <- model2ssgrfun(wmod, start1, data=wdf)
print(wss)
## function (prm)
## {
## resids <- rjfun(prm)
## ss <- as.numeric(crossprod(resids))
## if (gradient) {
## jacval <- attr(resids, "gradient")
## grval <- 2 * as.numeric(crossprod(jacval, resids))
## attr(ss, "gradient") <- grval
## }
## attr(ss, "resids") <- resids
## ss
## }
## <environment: 0x4770730>
# We can get expressions used to calculate these as follows:
wexpr.rj <- modelexpr(wrj)
print(wexpr.rj)
## expression({
## .expr3 <- exp(-b3 * ii)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - weed
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * ii))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
wexpr.ss <- modelexpr(wss)
print(wexpr.ss)
## expression({
## .expr3 <- exp(-b3 * ii)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - weed
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * ii))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
wrjdoc <- rjfundoc(wrj)
print(wrjdoc)
## FUNCTION wrj
## Formula: weed ~ b1/(1 + b2 * exp(-b3 * ii))
## Code: expression({
## .expr3 <- exp(-b3 * ii)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - weed
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * ii))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
## Parameters: b1, b2, b3
## Data: weed, ii
##
## VALUES
## Observations: 12
## Parameters:
## b1 b2 b3
## 1 1 1
## Data (length 12):
## weed ii
## 1 5.308 1
## 2 7.240 2
## 3 9.638 3
## 4 12.866 4
## 5 17.069 5
## 6 23.192 6
## 7 31.443 7
## 8 38.558 8
## 9 50.156 9
## 10 62.948 10
## 11 75.995 11
## 12 91.972 12
## FUNCTION wrj
## Formula: weed ~ b1/(1 + b2 * exp(-b3 * ii))
## Code: expression({
## .expr3 <- exp(-b3 * ii)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - weed
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * ii))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
## Parameters: b1, b2, b3
## Data: weed, ii
##
## VALUES
## Observations: 12
## Parameters:
## b1 b2 b3
## 1 1 1
## Data (length 12):
## weed ii
## 1 5.308 1
## 2 7.240 2
## 3 9.638 3
## 4 12.866 4
## 5 17.069 5
## 6 23.192 6
## 7 31.443 7
## 8 38.558 8
## 9 50.156 9
## 10 62.948 10
## 11 75.995 11
## 12 91.972 12
# We do not have similar function for ss functions
Given a nonlinear model expressed as an expression of the form of a function that computes the residuals from the model and a start vector par
, tries to minimize the nonlinear sum of squares of these residuals w.r.t. par
. If model(par, mydata)
computes an a vector of numbers that are presumed to be able to fit data lhs
, then the residual vector is (model(par,mydata) - lhs)
, though traditionally we write the negative of this vector. (Writing it this way allows the derivatives of the residuals w.r.t. the parameters par
to be the same as those for model(par,mydata).)
nlfb` tries to minimize the sum of squares of the residuals with respect to the parameters.
The method takes a parameter jacfn
which returns the Jacobian matrix of derivatives of the residuals w.r.t. the parameters in an attribute gradient
. If this is NULL, then a numerical approximation to derivatives is used. (??which – and can we use the character method to define these?? What about for nlxb??) Putting the Jacobian in the attribute gradient
allows us to combine the computation of the residual and Jacobian in the same code block if we wish, since there are generally common computations, though it does make the setup slightly more complicated. See the example below.
The start vector preferably uses named parameters (especially if there is an underlying formula). The attempted minimization of the sum of squares uses the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
shobbs.res <- function(x){ # scaled Hobbs weeds problem -- residual
# This variant uses looping
if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
tt <- 1:12
res <- 100.0*x[1]/(1+x[2]*10.*exp(-0.1*x[3]*tt)) - y
}
shobbs.jac <- function(x) { # scaled Hobbs weeds problem -- Jacobian
jj <- matrix(0.0, 12, 3)
tt <- 1:12
yy <- exp(-0.1*x[3]*tt)
zz <- 100.0/(1+10.*x[2]*yy)
jj[tt,1] <- zz
jj[tt,2] <- -0.1*x[1]*zz*zz*yy
jj[tt,3] <- 0.01*x[1]*zz*zz*yy*x[2]*tt
attr(jj, "gradient") <- jj
jj
}
cat("try nlfb\n")
## try nlfb
st <- c(b1=1, b2=1, b3=1)
low <- -Inf
up <- Inf
ans1 <- nlfb(st, shobbs.res, shobbs.jac, trace=TRUE)
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 10685 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 177800 at b1 = 7.1383 b2 = 9.1157 b3 = 3.7692 2 / 1
## lamda: 0.01 SS= 19087 at b1 = 1.4903 b2 = 3.072 b3 = 4.9249 3 / 1
## <<lamda: 0.004 SS= 1273.6 at b1 = 0.79487 b2 = 2.0743 b3 = 4.761 4 / 1
## lamda: 0.04 SS= 4265 at b1 = 1.0624 b2 = 1.9667 b3 = 2.3468 5 / 2
## <<lamda: 0.016 SS= 946.57 at b1 = 0.96631 b2 = 2.7886 b3 = 3.5088 6 / 2
## <<lamda: 0.0064 SS= 43.28 at b1 = 1.3414 b2 = 4.0037 b3 = 3.5588 7 / 3
## <<lamda: 0.00256 SS= 17.386 at b1 = 1.5631 b2 = 4.485 b3 = 3.3895 8 / 4
## <<lamda: 0.001024 SS= 6.6368 at b1 = 1.759 b2 = 4.6938 b3 = 3.2472 9 / 5
## <<lamda: 0.0004096 SS= 3.0719 at b1 = 1.8946 b2 = 4.8321 b3 = 3.1681 10 / 6
## <<lamda: 0.00016384 SS= 2.5977 at b1 = 1.9503 b2 = 4.8957 b3 = 3.1413 11 / 7
## <<lamda: 6.5536e-05 SS= 2.5873 at b1 = 1.961 b2 = 4.9082 b3 = 3.1362 12 / 8
## <<lamda: 2.6214e-05 SS= 2.5873 at b1 = 1.9618 b2 = 4.9091 b3 = 3.1357 13 / 9
## <<lamda: 1.0486e-05 SS= 2.5873 at b1 = 1.9619 b2 = 4.9092 b3 = 3.1357 14 / 10
summary(ans1)
## $residuals
## [1] 0.011899 -0.032756 0.092029 0.208781 0.392634 -0.057594 -1.105728
## [8] 0.715787 -0.107646 -0.348395 0.652593 -0.287570
##
## $sigma
## [1] 0.53617
##
## $df
## [1] 3 9
##
## $cov.unscaled
## b1 b2 b3
## b1 0.044469 0.047830 -0.025280
## b2 0.047830 0.099160 -0.017627
## b3 -0.025280 -0.017627 0.016386
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## b1 1.9619 0.113065 17.352 3.1656e-08
## b2 4.9092 0.168837 29.076 3.2825e-10
## b3 3.1357 0.068633 45.688 5.7677e-12
##
## $resname
## [1] "ans1"
##
## $ssquares
## [1] 2.5873
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " " " "
##
## $mt
## [1] " " " " " "
##
## $Sd
## [1] 130.1161 6.1653 2.7354
##
## $gr
## [,1]
## [1,] -7.3274e-06
## [2,] 1.4327e-07
## [3,] 1.7168e-06
##
## $jeval
## [1] 10
##
## $feval
## [1] 14
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
## No jacobian function -- use internal approximation
ans1n <- nlfb(st, shobbs.res, trace=TRUE, control=list(watch=FALSE)) # NO jacfn
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Using default jacobian approximation
## Start:lamda: 1e-04 SS= 10685 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 177800 at b1 = 7.1383 b2 = 9.1157 b3 = 3.7692 2 / 1
## lamda: 0.01 SS= 19087 at b1 = 1.4903 b2 = 3.072 b3 = 4.9249 3 / 1
## <<lamda: 0.004 SS= 1273.6 at b1 = 0.79487 b2 = 2.0743 b3 = 4.761 4 / 1
## lamda: 0.04 SS= 4265 at b1 = 1.0624 b2 = 1.9667 b3 = 2.3468 5 / 2
## <<lamda: 0.016 SS= 946.57 at b1 = 0.96631 b2 = 2.7886 b3 = 3.5088 6 / 2
## <<lamda: 0.0064 SS= 43.28 at b1 = 1.3414 b2 = 4.0037 b3 = 3.5588 7 / 3
## <<lamda: 0.00256 SS= 17.386 at b1 = 1.5631 b2 = 4.485 b3 = 3.3895 8 / 4
## <<lamda: 0.001024 SS= 6.6368 at b1 = 1.759 b2 = 4.6938 b3 = 3.2472 9 / 5
## <<lamda: 0.0004096 SS= 3.0719 at b1 = 1.8946 b2 = 4.8321 b3 = 3.1681 10 / 6
## <<lamda: 0.00016384 SS= 2.5977 at b1 = 1.9503 b2 = 4.8957 b3 = 3.1413 11 / 7
## <<lamda: 6.5536e-05 SS= 2.5873 at b1 = 1.961 b2 = 4.9082 b3 = 3.1362 12 / 8
## <<lamda: 2.6214e-05 SS= 2.5873 at b1 = 1.9618 b2 = 4.9091 b3 = 3.1357 13 / 9
## <<lamda: 1.0486e-05 SS= 2.5873 at b1 = 1.9619 b2 = 4.9092 b3 = 3.1357 14 / 10
summary(ans1n)
## $residuals
## [1] 0.011899 -0.032756 0.092029 0.208781 0.392634 -0.057594 -1.105728
## [8] 0.715787 -0.107646 -0.348395 0.652593 -0.287570
##
## $sigma
## [1] 0.53617
##
## $df
## [1] 3 9
##
## $cov.unscaled
## b1 b2 b3
## b1 0.044469 0.047830 -0.025280
## b2 0.047830 0.099160 -0.017627
## b3 -0.025280 -0.017627 0.016386
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## b1 1.9619 0.113065 17.352 3.1656e-08
## b2 4.9092 0.168837 29.076 3.2825e-10
## b3 3.1357 0.068633 45.688 5.7677e-12
##
## $resname
## [1] "ans1n"
##
## $ssquares
## [1] 2.5873
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " " " "
##
## $mt
## [1] " " " " " "
##
## $Sd
## [1] 130.1161 6.1653 2.7354
##
## $gr
## [,1]
## [1,] -7.3259e-06
## [2,] 1.4283e-07
## [3,] 1.7190e-06
##
## $jeval
## [1] 10
##
## $feval
## [1] 14
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
## difference
coef(ans1)-coef(ans1n)
## b1 b2 b3
## -1.6018e-09 -2.2231e-09 8.1156e-10
## attr(,"pkgname")
## [1] "nlsr"
Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method. The process is to develop the residual and Jacobian functions using model2rjfun
, then call nlfb
.
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
weeddata1 <- data.frame(y=ydat, tt=tdat)
anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x45d5af8>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
summary(anlxb1)
## $residuals
## [1] 0.011898 -0.032757 0.092028 0.208780 0.392633 -0.057594 -1.105727
## [8] 0.715788 -0.107645 -0.348394 0.652593 -0.287572
## attr(,"gradient")
## b1 b2 b3
## [1,] 0.027117 -0.10543 5.1756
## [2,] 0.036737 -0.14142 13.8849
## [3,] 0.049596 -0.18837 27.7424
## [4,] 0.066645 -0.24858 48.8137
## [5,] 0.089005 -0.32404 79.5373
## [6,] 0.117921 -0.41568 122.4383
## [7,] 0.154635 -0.52241 179.5224
## [8,] 0.200186 -0.63986 251.2937
## [9,] 0.255107 -0.75941 335.5262
## [10,] 0.319083 -0.86828 426.2515
## [11,] 0.390688 -0.95133 513.7251
## [12,] 0.467334 -0.99482 586.0462
##
## $sigma
## [1] 0.53617
##
## $df
## [1] 3 9
##
## $cov.unscaled
## b1 b2 b3
## b1 444.64742 47.824220 -0.25278540
## b2 47.82422 9.915178 -0.01762507
## b3 -0.25279 -0.017625 0.00016386
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## b1 196.18612 11.3059779 17.352 3.1644e-08
## b2 49.09162 1.6883034 29.077 3.2813e-10
## b3 0.31357 0.0068633 45.688 5.7678e-12
##
## $resname
## [1] "anlxb1"
##
## $ssquares
## [1] 2.5873
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " " " "
##
## $mt
## [1] " " " " " "
##
## $Sd
## [1] 1.0108e+03 4.6047e-01 4.7148e-02
##
## $gr
## [,1]
## b1 -2.6627e-07
## b2 1.5901e-07
## b3 -5.5308e-05
##
## $jeval
## [1] 18
##
## $feval
## [1] 25
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
Print summary output (but involving some serious computations!) of an object of class nlsr from or from package .
### From examples above
print(weedux)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 5 Jacobian and 6 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.167e-08 -1.334e-10 1011
## b2 49.0916 1.688 29.08 3.284e-10 -5.431e-11 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 3.865e-08 0.04714
print(ans1)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 10 Jacobian and 14 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 1.96186 0.1131 17.35 3.166e-08 -7.327e-06 130.1
## b2 4.90916 0.1688 29.08 3.282e-10 1.433e-07 6.165
## b3 3.1357 0.06863 45.69 5.768e-12 1.717e-06 2.735
For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) %*% residuals.
## Use shobbs example
RG <- resgr(st, shobbs.res, shobbs.jac)
RG
## [1] -10091.3 7835.3 -8234.2
SS <- resss(st, shobbs.res)
SS
## [1] 10685
Provide a summary output (but involving some serious computations!) of an object of class nlsr from nlxb or nlfb from package nlsr. Examples have been given above under nlxb and nlfb.
Given a nonlinear model expressed as an expression of the form
lhs ~ formula_for_rhs
and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
weeddata1 <- data.frame(y=ydat, tt=tdat)
## The following attempt with nls() fails!
anls1 <- try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1))
## 23521 : 1 1 1
## But we succeed by calling nlxb first.
anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x4ce4288>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
st2 <- coef(anlxb1)
anls2 <- try(nls(eunsc, start=st2, trace=TRUE, data=weeddata1))
## 2.5873 : 196.18612 49.09162 0.31357
## Or we can simply call wrapnlsr
anls2a <- try(wrapnlsr(eunsc, start=start1, trace=TRUE, data=weeddata1))
## wrapnls call with lower=[1] -Inf -Inf -Inf
## and upper=[1] Inf Inf Inf
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x501da48>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.164e-08 -2.663e-07 1011
## b2 49.0916 1.688 29.08 3.281e-10 1.59e-07 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 -5.531e-05 0.04715
## newstart: b1 b2 b3
## 196.18612 49.09162 0.31357
## nls call with no bounds
## 2.5873 : 196.18612 49.09162 0.31357
Weighted regression alters the sum of squares of the parameters prm
from
sum_{i} (residual_i(prm)^2)
to
sum_{i} (wts_i * residual_i(prm)^2)
where wts
is the vector of weights. This is equivalent to multiplying each residual or row of the Jacobian by the square root of the respective weight.
While nls()
has provision for (fixed) weights, the example given in the documentation of nls()
for weighted regression actually uses a functional form. Moreover, the weights are the square roots of the predicted or model values, so are not fixed. Here we replace these weights with the fixed values of the quantity we are trying to predict, namely the rate
variable in a kinetic model. In our example below we first use the documentation example with predicted values; note that the name of the result has been changed from that in the nls()
documentation. This result gives different estimates of the parameters from the fixed weights used afterwards.
Note that neither nlsr::nlxb
nor nlsr::nlfb
can use the weighted.MM function directly. This is one of the more awkward aspects of nonlinear least squares and nonlinear optimization.
Treated <- Puromycin[Puromycin$state == "treated", ]
weighted.MM <- function(resp, conc, Vm, K)
{
## Purpose: exactly as white book p. 451 -- RHS for nls()
## Weighted version of Michaelis-Menten model
## ----------------------------------------------------------
## Arguments: 'y', 'x' and the two parameters (see book)
## ----------------------------------------------------------
## Author: Martin Maechler, Date: 23 Mar 2001
pred <- (Vm * conc)/(K + conc)
(resp - pred) / sqrt(pred)
}
## Here is the estimation using predicted value weights
Pur.wtnlspred <- nls( ~ weighted.MM(rate, conc, Vm, K), data = Treated,
start = list(Vm = 200, K = 0.1))
summary(Pur.wtnlspred)
##
## Formula: 0 ~ weighted.MM(rate, conc, Vm, K)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Vm 2.07e+02 9.22e+00 22.42 7.0e-10 ***
## K 5.46e-02 7.98e-03 6.84 4.5e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.21 on 10 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 3.86e-06
## nlxb cannot use this form
Pur.wtnlxbpred <- try(nlxb( ~ weighted.MM(rate, conc, Vm, K), data = Treated,
start = list(Vm = 200, K = 0.1)))
## vn:[1] "rate" "conc" "Vm" "K"
## and the structure is wrong for nlfb. See wres.MM below.
Pur.wtnlfbpred <- try(nlfb(resfn=weighted.MM(rate, conc, Vm, K), data = Treated,
start = list(Vm = 200, K = 0.1)))
## no weights
## Now use fixed weights and the weights argument in nls()
Pur.wnls <- nls( rate ~ (Vm * conc)/(K + conc), data = Treated,
start = list(Vm = 200, K = 0.1), weights=1/rate)
summary(Pur.wnls)
##
## Formula: rate ~ (Vm * conc)/(K + conc)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Vm 2.10e+02 9.01e+00 23.27 4.9e-10 ***
## K 6.07e-02 8.39e-03 7.23 2.8e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.11 on 10 degrees of freedom
##
## Number of iterations to convergence: 7
## Achieved convergence tolerance: 9.48e-07
## nlsr::nlxb should give essentially the same result
library(nlsr)
## Note that we put in the dataframe name to explicitly specify weights
Pur.wnlxb <- nlxb( rate ~ (Vm * conc)/(K + conc), data = Treated,
start = list(Vm = 200, K = 0.1), weights=1/Treated$rate)
## vn:[1] "rate" "Vm" "conc" "K"
summary(Pur.wnlxb)
## $residuals
## [1] -2.755920 0.725597 0.734149 -0.267735 1.091190 -0.330634 0.420283
## [8] 0.997627 -0.136479 -0.838386 -0.580807 -0.095909
## attr(,"gradient")
## Vm K
## [1,] 0.24797 -644.41
## [2,] 0.24797 -644.41
## [3,] 0.49729 -863.88
## [4,] 0.49729 -863.88
## [5,] 0.64458 -791.67
## [6,] 0.64458 -791.67
## [7,] 0.78388 -585.42
## [8,] 0.78388 -585.42
## [9,] 0.90227 -304.70
## [10,] 0.90227 -304.70
## [11,] 0.94774 -171.15
## [12,] 0.94774 -171.15
##
## $sigma
## [1] 1.1078
##
## $df
## [1] 2 10
##
## $cov.unscaled
## Vm K
## Vm 66.088987 4.7937e-02
## K 0.047937 5.7385e-05
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## Vm 209.596813 9.0058754 23.2733 4.8539e-10
## K 0.060654 0.0083919 7.2276 2.8322e-05
##
## $resname
## [1] "Pur.wnlxb"
##
## $ssquares
## [1] 12.272
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " "
##
## $mt
## [1] " " " "
##
## $Sd
## [1] 210.28210 0.12301
##
## $gr
## [,1]
## Vm -3.5808e-12
## K -3.1367e-10
##
## $jeval
## [1] 8
##
## $feval
## [1] 9
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
## check difference
coef(Pur.wnls) - coef(Pur.wnlxb)
## Vm K
## -2.0921e-05 -2.5056e-08
## attr(,"pkgname")
## [1] "nlsr"
## Residuals Pur.wnls -- These are weighted
## resid(Pur.wnlxb) # ?? does not work -- why?
print(res(Pur.wnlxb))
## [1] -2.755920 0.725597 0.734149 -0.267735 1.091190 -0.330634 0.420283
## [8] 0.997627 -0.136479 -0.838386 -0.580807 -0.095909
## attr(,"gradient")
## Vm K
## [1,] 0.24797 -644.41
## [2,] 0.24797 -644.41
## [3,] 0.49729 -863.88
## [4,] 0.49729 -863.88
## [5,] 0.64458 -791.67
## [6,] 0.64458 -791.67
## [7,] 0.78388 -585.42
## [8,] 0.78388 -585.42
## [9,] 0.90227 -304.70
## [10,] 0.90227 -304.70
## [11,] 0.94774 -171.15
## [12,] 0.94774 -171.15
## Those from nls() are NOT weighted
resid(Pur.wnls)
## [1] 24.0255 -4.9745 -7.2305 2.7695 -12.1019 3.8981 -5.2996
## [8] -12.2996 1.8862 11.8862 8.3564 1.3564
## attr(,"label")
## [1] "Residuals"
## Try using a function, that is nlsr::nlfb
stw <- c(Vm = 200, K = 0.1)
wres.MM <- function(prm, rate, conc) {
Vm <- prm[1]
K <- prm[2]
pred <- (Vm * conc)/(K + conc)
(rate - pred) / sqrt(rate) # NOTE: NOT pred here
}
# test function first to see it works
print(wres.MM(stw, rate=Treated$rate, conc=Treated$conc))
## [1] 4.8942 1.9935 2.2338 3.0936 1.6445 2.9040 1.7051 1.1761 1.5414 2.2079
## [11] 1.6449 1.1785
Pur.wnlfb <- nlfb(start=stw, resfn=wres.MM, rate = Treated$rate, conc=Treated$conc,
trace=FALSE) # Note: weights are inside function already here
## no weights
summary(Pur.wnlfb)
## $residuals
## [1] 2.755920 -0.725597 -0.734149 0.267735 -1.091190 0.330634 -0.420283
## [8] -0.997627 0.136479 0.838386 0.580807 0.095909
##
## $sigma
## [1] 1.1078
##
## $df
## [1] 2 10
##
## $cov.unscaled
## Vm K
## Vm 66.088990 4.7937e-02
## K 0.047937 5.7385e-05
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## Vm 209.596813 9.0058756 23.2733 4.8539e-10
## K 0.060654 0.0083919 7.2276 2.8322e-05
##
## $resname
## [1] "Pur.wnlfb"
##
## $ssquares
## [1] 12.272
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " "
##
## $mt
## [1] " " " "
##
## $Sd
## [1] 210.28209 0.12301
##
## $gr
## [,1]
## [1,] -3.6038e-12
## [2,] -2.5381e-10
##
## $jeval
## [1] 8
##
## $feval
## [1] 9
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
print(Pur.wnlfb)
## nlsr object: x
## residual sumsquares = 12.272 on 12 observations
## after 8 Jacobian and 9 function evaluations
## name coeff SE tstat pval gradient JSingval
## Vm 209.597 9.006 23.27 4.854e-10 -3.604e-12 210.3
## K 0.0606538 0.008392 7.228 2.832e-05 -2.538e-10 0.123
## check estimates with nls() result
coef(Pur.wnls) - coef(Pur.wnlfb)
## Vm K
## -2.1161e-05 -2.5341e-08
## attr(,"pkgname")
## [1] "nlsr"
## and with nlxb result
coef(Pur.wnlxb) - coef(Pur.wnlfb)
## Vm K
## -2.4001e-07 -2.8479e-10
## attr(,"pkgname")
## [1] "nlsr"
wres0.MM <- function(prm, rate, conc) {
Vm <- prm[1]
K <- prm[2]
pred <- (Vm * conc)/(K + conc)
(rate - pred) # NOTE: NO weights
}
Pur.wnlfb0 <- nlfb(start=stw, resfn=wres0.MM, rate = Treated$rate, conc=Treated$conc,
weights=1/Treated$rate, trace=FALSE) # Note: explicit weights
summary(Pur.wnlfb0)
## $residuals
## [1] 4.8942 1.9935 2.2338 3.0936 1.6445 2.9040 1.7051 1.1761 1.5414 2.2079
## [11] 1.6449 1.1785
##
## $sigma
## [1] 2.6317
##
## $df
## [1] 2 10
##
## $cov.unscaled
## Vm K
## Vm 70.607853 -3.5304e-02
## K -0.035304 1.7652e-05
##
## $param
## Estimate Std. Error t value Pr(>|t|)
## Vm 200.0 22.114175 9.044 3.9605e-06
## K 0.1 0.011057 9.044 3.9606e-06
##
## $resname
## [1] "Pur.wnlfb0"
##
## $ssquares
## [1] 69.261
##
## $nobs
## [1] 12
##
## $ct
## [1] " " " "
##
## $mt
## [1] " " " "
##
## $Sd
## [1] 7.4995e+08 1.1901e-01
##
## $gr
## [,1]
## [1,] 3119894
## [2,] 6239784883
##
## $jeval
## [1] 1
##
## $feval
## [1] 14
##
## attr(,"pkgname")
## [1] "nlsr"
## attr(,"class")
## [1] "summary.nlsr"
print(Pur.wnlfb0)
## nlsr object: x
## residual sumsquares = 69.261 on 12 observations
## after 1 Jacobian and 14 function evaluations
## name coeff SE tstat pval gradient JSingval
## Vm 200 22.11 9.044 3.961e-06 3119894 749947494
## K 0.1 0.01106 9.044 3.961e-06 6.24e+09 0.119
## check estimates with nls() result
coef(Pur.wnls) - coef(Pur.wnlfb0)
## Vm K
## 9.596792 -0.039346
## attr(,"pkgname")
## [1] "nlsr"
## and with nlxb result
coef(Pur.wnlxb) - coef(Pur.wnlfb0)
## Vm K
## 9.596813 -0.039346
## attr(,"pkgname")
## [1] "nlsr"
wss.MM <- function(prm, rate, conc) {
ss <- as.numeric(crossprod(wres.MM(prm, rate, conc)))
}
library(optimr)
osol <- optimr(stw, fn=wss.MM, gr="grnd", method="Nelder-Mead", control=list(trace=0),
rate=Treated$rate, conc=Treated$conc)
print(osol)
## $par
## Vm K
## 209.596996 0.060653
##
## $value
## [1] 12.272
##
## $counts
## function gradient
## 69 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
## difference from nlxb
osol$par - coef(Pur.wnlxb)
## Vm K
## 1.8235e-04 -1.2139e-06
## attr(,"pkgname")
## [1] "nlsr"
Multi-line functions present a challenge. This is in part because the chain rule may have to be applied backwards (last line first), but also because there may be structures that are not always differentiable, such as if statements.
Note that the functional approach to nonlinear least squares – accessible via function nlfb
– does let us prepare residuals and Jacobians of complexity limited only by the capabilities of R.
This is the natural extension of Multi-line expressions. The authors would welcome collaboration with someone who has expertise in this area.
We would like to be able to find the Jacobian or gradient of functions that have as their parameters a vector, e.g., prm. At time of writing (January 2015) we cannot specify such a vector within nlsr. ?? is this true?
The following (rather long) script shows things that work or fail. In particular, it shows that the calls needed to get derivatives need to be set up precisely. This is not unique to R – the same sort of attention to (syntax specific) detail is present in other systems like Macsyma/Maxima.
First we set up the Hobbs weed problem. Initially it seemed a good idea to put the data WITHIN the residual function so that it would not have to be passed to the function. In retrospect, this is a bad idea because y and t are needed. To make the use of the data explicit, it is saved in the variables tdat
for time and ydat
for the weed data.
# tryhobbsderiv.R
ydat<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
75.995, 91.972)
tdat<-1:12
# try setting t and y here
t <- tdat
y <- ydat
# now define a function
hobbs.res<-function(x, t, y){ # Hobbs weeds problem -- residual
# This variant uses looping
# if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
# if(abs(12*x[3])>50) {
# res<-rep(Inf,12)
# } else {
res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y
# }
}
# test it
start1 <- c(200, 50, .3)
## residuals at start1
r1 <- hobbs.res(start1, t=tdat, y=ydat)
print(r1)
## [1] -0.050502 -0.207795 -0.260868 -0.412475 -0.616907 -1.605254 -3.364239
## [8] -2.430165 -4.287340 -5.630791 -5.675428 -7.447795
Now we try some of the derivative capabilities of nlsr
.
## NOTE: some functions may be seemingly correct for R, but we do not
## get the result desired, despite no obvious error. Always test.
require(nlsr)
# Try directly to differentiate the residual vector. r1 is numeric, so this should
# return a vector of zeros in a mathematical sense. In fact it gives an error,
# since R does not want to differentiate a numeric vector.
Jr1a <- fnDeriv(r1, "x")
## Error in codeDeriv(expr, namevec, do_substitute = FALSE, verbose = verbose, : Only single expressions allowed
# Set up a function containing expression with subscripted parameters x[]
hobbs1 <- function(x, t, y){ res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y }
# test the residuals
print(hobbs1(start1, t=tdat, y=ydat))
## [1] -0.050502 -0.207795 -0.260868 -0.412475 -0.616907 -1.605254 -3.364239
## [8] -2.430165 -4.287340 -5.630791 -5.675428 -7.447795
# Alternatively, let us set up t and y
y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
75.995, 91.972)
t<-1:12
# try different calls. Note that we need to include t and y data somehow
print(hobbs1(start1))
## Error in hobbs1(start1): argument "t" is missing, with no default
print(hobbs1(start1, t, y))
## [1] -0.050502 -0.207795 -0.260868 -0.412475 -0.616907 -1.605254 -3.364239
## [8] -2.430165 -4.287340 -5.630791 -5.675428 -7.447795
print(hobbs1(start1, tdat, ydat))
## [1] -0.050502 -0.207795 -0.260868 -0.412475 -0.616907 -1.605254 -3.364239
## [8] -2.430165 -4.287340 -5.630791 -5.675428 -7.447795
# We remove t and y, to ensure we don't get results from their presence
rm(t)
rm(y)
# Set up a function containing an expression with named (with numbers) parameters
# Note that we need to link these to the values in x
hobbs1m <- function(x, t, y){
x001 <- x[1]
x002 <- x[2]
x003 <- x[3]
res<-x001/(1+x002*exp(-x003*t)) - y
}
print(hobbs1m(x=start1, t=tdat, y=ydat))
## [1] -0.050502 -0.207795 -0.260868 -0.412475 -0.616907 -1.605254 -3.364239
## [8] -2.430165 -4.287340 -5.630791 -5.675428 -7.447795
# Function with explicit use of the expression()
hobbs1me <- function(x, t, y){
x001 <- x[1]
x002 <- x[2]
x003 <- x[3]
expression(x001/(1+x002*exp(-x003*t)) - y)
}
print(hobbs1me(start1, t=tdat, y=ydat))
## expression(x001/(1 + x002 * exp(-x003 * t)) - y)
# note failure (because the expression is not evaluated?)
#
# Now try to take derivatives
Jr11m <- fnDeriv(hobbs1m, c("x001", "x002", "x003"))
## Error in as.vector(x, "expression"): cannot coerce type 'closure' to vector of type 'expression'
# fails because expression is INSIDE a function (i.e., closure)
#
# try directly differentiating the expression
Jr11ex <-fnDeriv(expression(x001/(1+x002*exp(-x003*t)) - y)
, c("x001", "x002", "x003"))
# this seems to "work". Let us display the result
Jr11ex
## function (x001, x002, x003, t, y)
## {
## .expr1 <- -x003
## .expr2 <- .expr1 * t
## .expr3 <- exp(.expr2)
## .expr4 <- x002 * .expr3
## .expr5 <- 1 + .expr4
## .expr6 <- .expr5^2
## .value <- x001/(.expr5) - y
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("x001",
## "x002", "x003")))
## .grad[, "x001"] <- 1/.expr5
## .grad[, "x002"] <- -(x001 * .expr3/.expr6)
## .grad[, "x003"] <- -(x001 * (x002 * (.expr3 * -t))/.expr6)
## attr(.value, "gradient") <- .grad
## .value
## }
# Set the values of the parameters by name
x001 <- start1[1]
x002 <- start1[2]
x003 <- start1[3]
# and try to evaluate
print(eval(Jr11ex))
## function (x001, x002, x003, t, y)
## {
## .expr1 <- -x003
## .expr2 <- .expr1 * t
## .expr3 <- exp(.expr2)
## .expr4 <- x002 * .expr3
## .expr5 <- 1 + .expr4
## .expr6 <- .expr5^2
## .value <- x001/(.expr5) - y
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("x001",
## "x002", "x003")))
## .grad[, "x001"] <- 1/.expr5
## .grad[, "x002"] <- -(x001 * .expr3/.expr6)
## .grad[, "x003"] <- -(x001 * (x002 * (.expr3 * -t))/.expr6)
## attr(.value, "gradient") <- .grad
## .value
## }
# we need t and y, so set them
t<-tdat
y<-ydat
print(eval(Jr11ex))
## function (x001, x002, x003, t, y)
## {
## .expr1 <- -x003
## .expr2 <- .expr1 * t
## .expr3 <- exp(.expr2)
## .expr4 <- x002 * .expr3
## .expr5 <- 1 + .expr4
## .expr6 <- .expr5^2
## .value <- x001/(.expr5) - y
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("x001",
## "x002", "x003")))
## .grad[, "x001"] <- 1/.expr5
## .grad[, "x002"] <- -(x001 * .expr3/.expr6)
## .grad[, "x003"] <- -(x001 * (x002 * (.expr3 * -t))/.expr6)
## attr(.value, "gradient") <- .grad
## .value
## }
# But there is still a problem. WHY???
#
# Let us try it piece by piece (column by column)
resx <- expression(x001/(1+x002*exp(-x003*t)) - y)
res1 <- Deriv(resx, "x001", do_substitute=FALSE)
## Error in Deriv(resx, "x001", do_substitute = FALSE): unused argument (do_substitute = FALSE)
res1
## Error in eval(expr, envir, enclos): object 'res1' not found
col1 <- eval(res1)
## Error in eval(res1): object 'res1' not found
res2 <- Deriv(resx, "x002", do_substitute=FALSE)
## Error in Deriv(resx, "x002", do_substitute = FALSE): unused argument (do_substitute = FALSE)
res2
## Error in eval(expr, envir, enclos): object 'res2' not found
col2 <- eval(res2)
## Error in eval(res2): object 'res2' not found
res3 <- Deriv(resx, "x003", do_substitute=FALSE)
## Error in Deriv(resx, "x003", do_substitute = FALSE): unused argument (do_substitute = FALSE)
res3
## Error in eval(expr, envir, enclos): object 'res3' not found
col3 <- eval(res3)
## Error in eval(res3): object 'res3' not found
hobJac <- cbind(col1, col2, col3)
## Error in cbind(col1, col2, col3): object 'col1' not found
print(hobJac)
## Error in print(hobJac): object 'hobJac' not found
Clearly there is still some work to do to make the mechanism for getting derivatives more easily, and with less chance of error. Work still to do???.
Jacobians, the matrices of partial derivatives of residuals with respect to the parameters, have a vector input (the parameters). nlsr
attempts to generate the code for this computation. This is the first key improvement of nlsr
over nls()
. It will likely be a continuing effort to enlarge the set of functions and expressions that can be handled and to deal with them more efficiently. Also it would be nice to automate the generation of numerical approximations for those components of the Jacobian for which analytic derivatives are not available.
Note that the Jacobian inner product (J^T J) differs from the Hessian of the sum of squares by a matrix whose elements that are an inner product of the residuals with their second derivatives with respect to the parameters. This is a summation of m matrices each involving a residual times its (n by n) partial derivatives with respect to the parameters. Generally we treat this matrix as “ignorable” on the basis that the residuals should be “small”, but clearly this is not always the case.
Nonlinear least squares methods are mostly founded on some or other variant of the Gauss-Newton algorithm. The function we wish to minimize is the sum of squares of the (nonlinear) residuals r(x) where there are m observations (elements of r) and n parameters x. Hence the function is
f(x) = sum(r(k)^2)
Newton’s method starts with an original set of parameters x[0]. At a given iteraion, which could be the first, we want to solve
x[k+1] = x[k] - H^(-1) g
where H is the Hessian and g is the gradient at x[k]. We can rewrite this as a solution, at each iteration, of
H delta = -g
with
x[k+1] = x[k] + delta
For the particular sum of squares, the gradient is
g(x) = 2 * r(k)
and
H(x) = 2 ( J' J + sum(r * Z))
where J is the Jacobian (first derivatives of r w.r.t. x) and Z is the tensor of second derivatives of r w.r.t. x). Note that J’ is the transpose of J.
The primary simplification of the Gauss-Newton method is to assume that the second term above is negligible. As there is a common factor of 2 on each side of the Newton iteration after the simplification of the Hessian, the Gauss-Newton iteration equation is
J' J delta = - J' r
This iteration frequently fails. The approximation of the Hessian by the Jacobian inner-product is one reason, but there is also the possibility that the sum of squares function is not “quadratic” enough that the unit step reduces it. Hartley (1961) introduced a line search along delta, while Marquardt (1963) suggested replacing J’ J with (J’ J + lambda * D) where D is a diagonal matrix intended to partially approximate the omitted portion of the Hessian.
Marquardt suggested D = I (a unit matrix) or D = (diagonal part of J’ J). The former approach, when lambda is large enough that the iteration is essentially
delta = - g / lambda
we get a version of the steepest descents algorithm. Using the diagonal of J’ J, we have a scaled version of this (see https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm)
Nash (1977) found that on low precision machines, it was common for diagonal elements of J’ J to underflow. A very small modification to solve
(J' J + lambda * (D + phi * I)) * delta = - g
where phi is a small number (phi = 1 seems to work quite well) ?? check??.
In jncnm79, the iteration equation was solved as stated. However, this involves forming the sum of squares and cross products of J, a process that loses some numerical precision. A better way to solve the linear equations is to apply the QR decomposition to the matrix J itself. However, we still need to incorporate the lambda * I or lambda * D adjustments. This is done by adding rows to J that are the square roots of the “pieces”. We add 1 row for each diagonal element of I and each diagonal element of D.
In each iteration, we reduce the lambda parameter before solution. If the resulting sum of squares is not reduced, lambda is increased, otherwise we move to the next iteration. Various authors (including the present one) have suggested different strategies for this. My current opinion is that a “quick” increase, say a factor of 10, and a “slow” decrease, say a factor of 0.2, work quite well. However, it is important to check that lambda has not got too small or underflowed before applying the increase factor. On the other hand, it is useful to be able to set lambda = 0 in the code so that a pure Gauss-Newton method can be evaluated with the program(s). The current code nlfb()
uses the line
if (lamda<1000*.Machine$double.eps) lamda<-1000*.Machine$double.eps
to ensure we get an increase. To force a Gauss-Newton algorithm, the controls laminc
and lamdec
are set to 0.
The Levenberg-Marquardt adjustment to the Gauss-Newton approach is the second major improvement of nlsr
(and also its predecessor nlmrt
and the package minpack-lm
) over nls()
.
The success of many optimization codes often depends on knowing when no more progress can be made. For the present codes, one simple procedure increases the damping parameter lamda whenever the sum of squares cannot be decreased, with termination when the parameters are not altered by the addition of delta. While this “works”, it does incur unnecessary computational effort.
A better approach is that of Bates and Watts (1981). Their relative offset criterion considers the potential progress that could be made in reducing the current residuals to the initial sum of squares. This is a sensible and very effective strategy for most situations, but is dangerous where the problem has very small or zero residuals, as in the case of an interpolation problem or nonlinear equations.
To illustrate this for zero residual problems, let us set up a simple test.
## test small resid case with roffset
tt <- 1:25
ymod <- 10 * exp(-0.01*tt) + 5
n <- length(tt)
evec0 <- rep(0, n)
evec1 <- 1e-4*runif(n, -.5, .5)
evec2 <- 1e-1*runif(n, -.5, .5)
y0 <- ymod + evec0
y1 <- ymod + evec1
y2 <- ymod + evec2
mydata <- data.frame(tt, y0, y1, y2)
st <- c(aa=1, bb=1, cc=1)
We provide three sizes of residuals here, but will only illustrate the consequences of zero residuals (the problem with y0). Let us run the nonlinear least squares problem to get the interpolating function, first with nls()
, then with nlxb()
. In the second case we have turned off the trace, as the approach “works” because we have taken care in the code to provide for problems with small residuals.
nlsfit0 <- try(nls(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=TRUE))
## 4092.5 : 1 1 1
## 2651.2 : 0.15468 -0.11928 2.57602
## 1690.9 : -0.015565 -0.145348 5.764209
## 305.56 : -0.16284 0.18151 10.40450
## 32.728 : 1.7376 3.4630 12.8469
nlsfit0
## [1] "Error in numericDeriv(form[[3L]], names(ind), env) : \n Missing value or an infinity produced when evaluating the model\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in numericDeriv(form[[3L]], names(ind), env): Missing value or an infinity produced when evaluating the model>
library(nlsr)
nlsrfit0 <- try(nlxb(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=FALSE))
## vn:[1] "y0" "aa" "bb" "tt" "cc"
## no weights
nlsrfit0
## nlsr object: x
## residual sumsquares = 1.5561e-10 on 25 observations
## after 33 Jacobian and 44 function evaluations
## name coeff SE tstat pval gradient JSingval
## aa 9.99927 0.0002277 43918 7.149e-89 3.423e-05 614.6
## bb 0.0100008 2.61e-07 38321 1.435e-87 -0.005629 3.197
## cc 5.00073 0.000229 21834 3.399e-82 4.054e-05 0.008235
We can approximate what nls()
is doing by running a simple Gauss-Newton method one step at a time. It is (or was at the time of writing) easier to do this in functional form with explicit derivatives. Note that we test these functions!
trf <- function(par, data) {
tt <- data[,"tt"]
res <- par["aa"]*exp(-par["bb"]*tt) + par["cc"] - y0
}
print(trf(st, data=mydata))
## [1] -13.533 -13.667 -13.655 -13.590 -13.506 -13.415 -13.323 -13.231
## [9] -13.139 -13.048 -12.958 -12.869 -12.781 -12.694 -12.607 -12.521
## [17] -12.437 -12.353 -12.270 -12.187 -12.106 -12.025 -11.945 -11.866
## [25] -11.788
trj <- function(par, data) {
tt <- data[,"tt"]
m <- dim(data)[1]
JJ <- matrix(NA, nrow=m, ncol=3)
JJ[,1] <- exp(-par["bb"]*tt)
JJ[,2] <- - tt * par["aa"] * exp(-par["bb"]*tt)
JJ[,3] <- 1
JJ
}
Ja <- trj(st, data=mydata)
print(Ja)
## [,1] [,2] [,3]
## [1,] 3.6788e-01 -3.6788e-01 1
## [2,] 1.3534e-01 -2.7067e-01 1
## [3,] 4.9787e-02 -1.4936e-01 1
## [4,] 1.8316e-02 -7.3263e-02 1
## [5,] 6.7379e-03 -3.3690e-02 1
## [6,] 2.4788e-03 -1.4873e-02 1
## [7,] 9.1188e-04 -6.3832e-03 1
## [8,] 3.3546e-04 -2.6837e-03 1
## [9,] 1.2341e-04 -1.1107e-03 1
## [10,] 4.5400e-05 -4.5400e-04 1
## [11,] 1.6702e-05 -1.8372e-04 1
## [12,] 6.1442e-06 -7.3731e-05 1
## [13,] 2.2603e-06 -2.9384e-05 1
## [14,] 8.3153e-07 -1.1641e-05 1
## [15,] 3.0590e-07 -4.5885e-06 1
## [16,] 1.1254e-07 -1.8006e-06 1
## [17,] 4.1399e-08 -7.0379e-07 1
## [18,] 1.5230e-08 -2.7414e-07 1
## [19,] 5.6028e-09 -1.0645e-07 1
## [20,] 2.0612e-09 -4.1223e-08 1
## [21,] 7.5826e-10 -1.5923e-08 1
## [22,] 2.7895e-10 -6.1368e-09 1
## [23,] 1.0262e-10 -2.3602e-09 1
## [24,] 3.7751e-11 -9.0603e-10 1
## [25,] 1.3888e-11 -3.4720e-10 1
library(numDeriv)
Jn <- jacobian(trf, st, data=mydata)
print(Jn)
## [,1] [,2] [,3]
## [1,] 3.6788e-01 -3.6788e-01 1
## [2,] 1.3534e-01 -2.7067e-01 1
## [3,] 4.9787e-02 -1.4936e-01 1
## [4,] 1.8316e-02 -7.3263e-02 1
## [5,] 6.7379e-03 -3.3690e-02 1
## [6,] 2.4788e-03 -1.4873e-02 1
## [7,] 9.1188e-04 -6.3832e-03 1
## [8,] 3.3546e-04 -2.6837e-03 1
## [9,] 1.2341e-04 -1.1107e-03 1
## [10,] 4.5400e-05 -4.5400e-04 1
## [11,] 1.6702e-05 -1.8372e-04 1
## [12,] 6.1442e-06 -7.3731e-05 1
## [13,] 2.2603e-06 -2.9384e-05 1
## [14,] 8.3156e-07 -1.1641e-05 1
## [15,] 3.0599e-07 -4.5885e-06 1
## [16,] 1.1246e-07 -1.8006e-06 1
## [17,] 4.1441e-08 -7.0390e-07 1
## [18,] 1.5292e-08 -2.7413e-07 1
## [19,] 5.5106e-09 -1.0642e-07 1
## [20,] 2.0606e-09 -4.1195e-08 1
## [21,] 6.7789e-10 -1.5917e-08 1
## [22,] 2.8422e-10 -6.0949e-09 1
## [23,] 5.5252e-11 -2.3285e-09 1
## [24,] -1.5802e-11 -8.2053e-10 1
## [25,] 5.2006e-13 -3.5475e-10 1
print(max(abs(Jn-Ja)))
## [1] 1.0583e-10
ssf <- function(par, data){
rr <- trf(par, data)
ss <- crossprod(rr)
}
print(ssf(st, data=mydata))
## [,1]
## [1,] 4092.5
library(numDeriv)
print(jacobian(trf, st, data=mydata))
## [,1] [,2] [,3]
## [1,] 3.6788e-01 -3.6788e-01 1
## [2,] 1.3534e-01 -2.7067e-01 1
## [3,] 4.9787e-02 -1.4936e-01 1
## [4,] 1.8316e-02 -7.3263e-02 1
## [5,] 6.7379e-03 -3.3690e-02 1
## [6,] 2.4788e-03 -1.4873e-02 1
## [7,] 9.1188e-04 -6.3832e-03 1
## [8,] 3.3546e-04 -2.6837e-03 1
## [9,] 1.2341e-04 -1.1107e-03 1
## [10,] 4.5400e-05 -4.5400e-04 1
## [11,] 1.6702e-05 -1.8372e-04 1
## [12,] 6.1442e-06 -7.3731e-05 1
## [13,] 2.2603e-06 -2.9384e-05 1
## [14,] 8.3156e-07 -1.1641e-05 1
## [15,] 3.0599e-07 -4.5885e-06 1
## [16,] 1.1246e-07 -1.8006e-06 1
## [17,] 4.1441e-08 -7.0390e-07 1
## [18,] 1.5292e-08 -2.7413e-07 1
## [19,] 5.5106e-09 -1.0642e-07 1
## [20,] 2.0606e-09 -4.1195e-08 1
## [21,] 6.7789e-10 -1.5917e-08 1
## [22,] 2.8422e-10 -6.0949e-09 1
## [23,] 5.5252e-11 -2.3285e-09 1
## [24,] -1.5802e-11 -8.2053e-10 1
## [25,] 5.2006e-13 -3.5475e-10 1
The traditional way to implement a Gauss Newton method was to form the sum of squares and cross-products matrix at in traditional linear regression, using the Jacobian as the data matrix. The right hand side uses the inner product of the Jacobian with the negative residuals. This approach increases the condition number of the problem, and a more direct QR decomposition of the Jacobian is now the preferred approach. However, let us try both to ensure we are getting similar answers. First the QR approach.
gnjn <- function(start, resfn, jacfn = NULL, trace = FALSE,
data=NULL, control=list(), ...){
# simplified Gauss Newton
offset = 1e6 # for no change in parms
stepred <- 0.5 # start with this as per nls()
par <- start
cat("starting parameters:")
print(par)
res <- resfn(par, data, ...)
ssbest <- as.numeric(crossprod(res))
cat("initial ss=",ssbest,"\n")
par0 <- par
kres <- 1
kjac <- 0
keepon <- TRUE
while (keepon) {
cat("kjac=",kjac," kres=",kres," SSbest now ", ssbest,"\n")
print(par)
JJ <- jacfn(par, data, ...)
kjac <- kjac + 1
QJ <- qr(JJ)
delta <- qr.coef(QJ, -res)
ss <- ssbest + offset*offset # force evaluation
step <- 1.0
if (as.numeric(max(par0+delta)+offset) != as.numeric(max(par0+offset)) ) {
while (ss > ssbest) {
par <- par0+delta * step
res <- resfn(par, data, ...)
ss <- as.numeric(crossprod(res))
kres <- kres + 1
## cat("step =", step," ss=",ss,"\n")
## tmp <- readline("continue")
if (ss > ssbest) {
step <- step * stepred
} else {
par0 <- par
ssbest <- ss
}
} # end inner loop
if (kjac >= 4) {
keepon = FALSE
cat("artificial stop at kjac=4 -- we only want to check output")
}
} else { keepon <- FALSE # done }
} # end main iteration
} # seems to need this
} # end gnjne
fitgnjn0 <- gnjn(st, trf, trj, data=mydata)
## Another way
#- set lamda = 0 in nlxb, fix laminc, lamdec
library(nlsr)
nlx00 <- try(nlxb(y0 ~ aa * exp(-bb*tt) + cc, start=st, data=mydata, trace=TRUE,
control=list(lamda=0, laminc=0, lamdec=0, watch=TRUE)))
nlx00
The first iteration more or less matches the nls()
result. The problem is quite ill-conditioned, and nls()
is using a numerical approximation to the Jacobian, so deviations of this magnitude from the iterations are not unexpected.
Let us test with a traditional Gauss-Newton.
gnjn2 <- function(start, resfn, jacfn = NULL, trace = FALSE,
data=NULL, control=list(), ...){
# simplified Gauss Newton
offset = 1e6 # for no change in parms
stepred <- 0.5 # start with this as per nls()
par <- start
cat("starting parameters:")
print(par)
res <- resfn(par, data, ...)
ssbest <- as.numeric(crossprod(res))
cat("initial ss=",ssbest,"\n")
kres <- 1
kjac <- 0
par0 <- par
keepon <- TRUE
while (keepon) {
cat("kres=",kres," kjac=",kjac," SSbest now ", ssbest,"\n")
print(par)
JJ <- jacfn(par, data, ...)
kjac <- kjac + 1
JTJ <- crossprod (JJ)
JTr <- crossprod (JJ, res)
delta <- - as.vector(solve(JTJ, JTr))
ss <- ssbest + offset*offset # force evaluation
step <- 1.0
if (as.numeric(max(par0+delta)+offset) != as.numeric(max(par0+offset)) ) {
while (ss > ssbest) {
par <- par0+delta * step
res <- resfn(par, data, ...)
ss <- as.numeric(crossprod(res))
kres <- kres + 1
## cat("step =", step," ss=",ss," best is",ssbest,"\n")
## tmp <- readline("continue")
if (ss > ssbest) {
step <- step * stepred
} else {
par0 <- par
ssbest <- ss
}
} # end inner loop
if (kjac >= 4) {
keepon = FALSE
cat("artificial stop at kjac=4 -- we only want to check output")
}
} else { keepon <- FALSE # done }
} # end main iteration
} # seems to need this
} # end gnjn2
fitgnjn20 <- gnjn2(st, trf, trj, data=mydata)
Solution of sets of nonlinear equations is generally NOT a problem that is commonly required for statisticians or data analysts. My experience is that the occasions where it does arise are when workers try to solve the first order conditions for optimality of a function, rather than try to optimize the function. If this function is a sum of squares, then we have a nonlinear least squares problem, and generally such problems are best approached my methods of the type discussed in this article.
Conversely, since our problem is, using the notation above, equivalent to
r(x) = 0
the solution of a nonlinear least squares problem for which the final sum of squares is zero provides a solution to the nonlinear equations. In my experience this is a valid approach to the nonlinear equations problem, especially if there is concern that a solution may not exist. Note that there are methods for nonlinear equations, some of which (??ref) are available in R packages.
In nlsr
, we provide for modelling via R functions that generate the residuals and Jacobian, and we call the nonlinear least squares minimizer through nlfb
. Alternatively, we can do the modelling via the nlxb
function that takes a model expression as an argument and converts it to the functional form. The actual least squares minimization is then carried out by a common routine (nlfb
). Generally nlxb
is a lot less programming work, but it is also more limited, as we can only handle single line expressions, and some expressions may not be differentiable within the package, even if there are mathematically feasible ways to compute them analytically.
Almost all statistical functions have exogenous data that is needed to compute residuals or likelihoods and is not dependent on the model parameters. (This section starts from Notes140806.)
model2rjfun does NOT have … args.
Should it have? i.e., a problem where we are fitting a set of time series, 1 for each plant/animal, with some sort of start parameter for each that is NOT estimated (e.g., pH of soil, some index of health).
Difficulty in such a problem is that the residuals are then a matrix, and the nlfb rather than nlxb is a better approach. However, fitting 1 series would still need this data, and example nlsrtryextraparms.txt shows that the extra parm (ms in this case) needs to be in the user’s globalenv.
rm(list=ls())
require(nlsr)
# want to have data AND extra parameters (NOT to be estimated)
traceval <- TRUE # traceval set TRUE to debug or give full history
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr.
start1 <- c(b1=1, b2=1, b3=1)
startf1 <- c(b1=1, b2=1, b3=.1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
cat("weeddata contains the original data\n")
## weeddata contains the original data
ms <- 2 # define the external parameter here
cat("wdata scales y by ms =",ms,"\n")
## wdata scales y by ms = 2
wdata <- data.frame(y=ydat/ms, tt=tdat)
wdata
## y tt
## 1 2.6540 1
## 2 3.6200 2
## 3 4.8190 3
## 4 6.4330 4
## 5 8.5345 5
## 6 11.5960 6
## 7 15.7215 7
## 8 19.2790 8
## 9 25.0780 9
## 10 31.4740 10
## 11 37.9975 11
## 12 45.9860 12
cat("estimate the UNSCALED model with SCALED data\n")
## estimate the UNSCALED model with SCALED data
anlxbs <- try(nlxb(eunsc, start=start1, trace=traceval, data=wdata))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 2.6540 3.6200 4.8190 6.4330 8.5345 11.5960 15.7215 19.2790
## [9] 25.0780 31.4740 37.9975 45.9860
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x50a6e30>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 5676.9 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 6088.9 at b1 = 25.443 b2 = -119.86 b3 = -185.95 2 / 1
## lamda: 0.01 SS= 6088.9 at b1 = 25.092 b2 = -94.85 b3 = -166.61 3 / 1
## lamda: 0.1 SS= 6088.9 at b1 = 23.492 b2 = -12.164 b3 = -96.948 4 / 1
## lamda: 1 SS= 6088.9 at b1 = 19.105 b2 = 15.514 b3 = -31.768 5 / 1
## lamda: 10 SS= 6078 at b1 = 9.6639 b2 = 2.3414 b3 = -1.0745 6 / 1
## <<lamda: 4 SS= 5096.5 at b1 = 2.5089 b2 = 0.94659 b3 = 1.1409 7 / 1
## <<lamda: 1.6 SS= 4086.7 at b1 = 5.5395 b2 = 1.1277 b3 = 0.98661 8 / 2
## lamda: 16 SS= 5979.2 at b1 = 10.709 b2 = 2.4361 b3 = -0.35798 9 / 3
## <<lamda: 6.4 SS= 3871.1 at b1 = 6.2757 b2 = 1.195 b3 = 0.95044 10 / 3
## <<lamda: 2.56 SS= 3427.1 at b1 = 7.914 b2 = 1.4621 b3 = 0.76952 11 / 4
## <<lamda: 1.024 SS= 2714 at b1 = 11.243 b2 = 2.2923 b3 = 0.41244 12 / 5
## <<lamda: 0.4096 SS= 1624.3 at b1 = 17.581 b2 = 3.9784 b3 = 0.51333 13 / 6
## <<lamda: 0.16384 SS= 1596 at b1 = 25.778 b2 = 7.8443 b3 = 0.25361 14 / 7
## <<lamda: 0.065536 SS= 389.99 at b1 = 36.58 b2 = 11.353 b3 = 0.40144 15 / 8
## <<lamda: 0.026214 SS= 227.36 at b1 = 47.307 b2 = 19.475 b3 = 0.32531 16 / 9
## <<lamda: 0.010486 SS= 30.768 at b1 = 61.38 b2 = 30.701 b3 = 0.35494 17 / 10
## <<lamda: 0.0041943 SS= 7.5356 at b1 = 71.376 b2 = 39.434 b3 = 0.3436 18 / 11
## <<lamda: 0.0016777 SS= 2.4081 at b1 = 80.801 b2 = 44.729 b3 = 0.33443 19 / 12
## <<lamda: 0.00067109 SS= 1.1886 at b1 = 88.884 b2 = 47.007 b3 = 0.32377 20 / 13
## <<lamda: 0.00026844 SS= 0.72638 at b1 = 94.691 b2 = 48.297 b3 = 0.31699 21 / 14
## <<lamda: 0.00010737 SS= 0.6497 at b1 = 97.378 b2 = 48.922 b3 = 0.31428 22 / 15
## <<lamda: 4.295e-05 SS= 0.64684 at b1 = 98.022 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 0.64682 at b1 = 98.09 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 0.64682 at b1 = 98.093 b2 = 49.092 b3 = 0.31357 25 / 18
print(anlxbs)
## nlsr object: x
## residual sumsquares = 0.64682 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 98.0931 5.653 17.35 3.164e-08 -1.337e-07 505.4
## b2 49.0916 1.688 29.08 3.281e-10 3.017e-08 0.2344
## b3 0.31357 0.006863 45.69 5.768e-12 -1.398e-05 0.04632
escal <- y ~ ms*b1/(1+b2*exp(-b3*tt))
cat("estimate the SCALED model with scaling provided in the call (ms=0.5)\n")
## estimate the SCALED model with scaling provided in the call (ms=0.5)
anlxbh <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1, ms=0.5))
print(anlxbh)
## [1] "Error in nlxb(escal, start = start1, trace = traceval, data = weeddata1, : \n unused argument (ms = 0.5)\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nlxb(escal, start = start1, trace = traceval, data = weeddata1, ms = 0.5): unused argument (ms = 0.5)>
cat("\n scaling is now using the globally defined value of ms=",ms,"\n")
##
## scaling is now using the globally defined value of ms= 2
anlxb1a <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "ms" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "ms" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable ms :[1] 2
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x3a15a58>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 22708 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 25.472 b2 = -122.14 b3 = -187.69 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 25.327 b2 = -113.2 b3 = -180.68 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 24.283 b2 = -57.588 b3 = -135.68 4 / 1
## lamda: 1 SS= 24356 at b1 = 20.254 b2 = 14.461 b3 = -54.127 5 / 1
## lamda: 10 SS= 24355 at b1 = 10.077 b2 = 4.8657 b3 = -4.5699 6 / 1
## <<lamda: 4 SS= 20252 at b1 = 2.5977 b2 = 0.83113 b3 = 1.4117 7 / 1
## <<lamda: 1.6 SS= 16143 at b1 = 5.7092 b2 = 1.3529 b3 = 0.86529 8 / 2
## lamda: 16 SS= 22536 at b1 = 11.269 b2 = 3.0589 b3 = -0.12239 9 / 3
## <<lamda: 6.4 SS= 15214 at b1 = 6.5093 b2 = 1.428 b3 = 0.86138 10 / 3
## <<lamda: 2.56 SS= 13336 at b1 = 8.2714 b2 = 1.7659 b3 = 0.74306 11 / 4
## <<lamda: 1.024 SS= 10290 at b1 = 11.801 b2 = 2.7987 b3 = 0.46561 12 / 5
## <<lamda: 0.4096 SS= 5999.3 at b1 = 18.477 b2 = 4.9891 b3 = 0.46298 13 / 6
## <<lamda: 0.16384 SS= 3360.8 at b1 = 27.609 b2 = 9.7076 b3 = 0.35379 14 / 7
## <<lamda: 0.065536 SS= 872.75 at b1 = 40.302 b2 = 16.934 b3 = 0.38655 15 / 8
## <<lamda: 0.026214 SS= 308.99 at b1 = 51.624 b2 = 26.628 b3 = 0.36281 16 / 9
## <<lamda: 0.010486 SS= 64.882 at b1 = 63.651 b2 = 36.557 b3 = 0.35778 17 / 10
## <<lamda: 0.0041943 SS= 19.992 at b1 = 73.76 b2 = 43.123 b3 = 0.3463 18 / 11
## <<lamda: 0.0016777 SS= 8.8066 at b1 = 83.053 b2 = 46.013 b3 = 0.33222 19 / 12
## <<lamda: 0.00067109 SS= 4.1734 at b1 = 91.086 b2 = 47.556 b3 = 0.32103 20 / 13
## <<lamda: 0.00026844 SS= 2.7218 at b1 = 96.072 b2 = 48.622 b3 = 0.31554 21 / 14
## <<lamda: 0.00010737 SS= 2.5891 at b1 = 97.8 b2 = 49.023 b3 = 0.31386 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 98.074 b2 = 49.087 b3 = 0.31359 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 98.093 b2 = 49.091 b3 = 0.31357 24 / 17
print(anlxb1a)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 17 Jacobian and 24 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 98.0925 5.651 17.36 3.153e-08 -9.046e-06 1011
## b2 49.0915 1.688 29.09 3.27e-10 6.839e-06 0.4687
## b3 0.31357 0.006863 45.69 5.769e-12 -0.003104 0.09268
ms <- 1
cat("\n scaling is now using the globally re-defined value of ms=",ms,"\n")
##
## scaling is now using the globally re-defined value of ms= 1
anlxb1b <- try(nlxb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "ms" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "ms" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable ms :[1] 1
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x5b326d8>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
print(anlxb1b)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.164e-08 -2.663e-07 1011
## b2 49.0916 1.688 29.08 3.281e-10 1.59e-07 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 -5.531e-05 0.04715
We use the Hobbs Weeds problem (Nash, 1979 and Nash, 2014). Note that nls() fails from start1.
require(nlsr)
traceval <- FALSE
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr.
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
weeddata2 <- data.frame(y=1.5*ydat, tt=tdat)
anlxb1 <- try(nlxb(eunsc, start=start1, trace=TRUE, data=weeddata1,
control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x4f12360>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
print(anlxb1)
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.164e-08 -2.663e-07 1011
## b2 49.0916 1.688 29.08 3.281e-10 1.59e-07 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 -5.531e-05 0.04715
anlsb1 <- try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1))
## 23521 : 1 1 1
print(anlsb1)
## [1] "Error in nls(eunsc, start = start1, trace = TRUE, data = weeddata1) : \n singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = start1, trace = TRUE, data = weeddata1): singular gradient>
A different start causes nlxb to return a large sum of squares. Note that nls() again fails.
startf1 <- c(b1=1, b2=1, b3=.1)
anlsf1 <- try(nls(eunsc, start=startf1, trace=TRUE, data=weeddata1))
## 23756 : 1.0 1.0 0.1
print(anlsf1)
## [1] "Error in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1) : \n singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1): singular gradient>
library(nlsr)
anlxf1 <- try(nlxb(eunsc, start=startf1, trace=TRUE, data=weeddata1,
control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## Finished masks check
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x5a05510>
## no weights
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## Start:lamda: 1e-04 SS= 23756 at b1 = 1 b2 = 1 b3 = 0.1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 416.72 b2 = 821.43 b3 = -40.851 2 / 1
## lamda: 0.01 SS= 196562 at b1 = 162.93 b2 = 368.16 b3 = 7.5932 3 / 1
## <<lamda: 0.004 SS= 10597 at b1 = 24.764 b2 = 108.11 b3 = 31.926 4 / 1
## <<lamda: 0.0016 SS= 9205.5 at b1 = 35.486 b2 = 108.11 b3 = 31.926 5 / 2
## <<lamda: 0.00064 SS= 9205.4 at b1 = 35.532 b2 = 108.11 b3 = 31.926 6 / 3
## <<lamda: 0.000256 SS= 9205.4 at b1 = 35.532 b2 = 108.11 b3 = 31.926 7 / 4
print(anlxf1)
## nlsr object: x
## residual sumsquares = 9205.4 on 12 observations
## after 4 Jacobian and 7 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 35.5321 NA NA NA -6.684e-07 3.464
## b2 108.109 NA NA NA -1.464e-11 5.015e-11
## b3 31.9262 NA NA NA 1.583e-09 6.427e-27
# anlxb2 <- try(nlxb(eunsc, start=start1, trace=FALSE, data=weeddata2))
# print(anlxb2)
We can discover quickly the difficulty here by computing the Jacobian at this “solution” and checking its singular values.
cf1 <- coef(anlxf1)
print(cf1)
## b1 b2 b3
## 35.532 108.109 31.926
## attr(,"pkgname")
## [1] "nlsr"
jf1 <- anlxf1$jacobian
svals <- svd(jf1)$d
print(svals)
## [1] 3.4641e+00 5.0146e-11 6.4267e-27
Here we see that the Jacobian is only rank 1, even though there are 3 coefficients. It is therefore not surprising that our nonlinear least squares program has concluded we are unable to make further progress.
We can run the same example as above using R functions rather than expressions, but now we need to have a gradient function as well as one to compute residuals. nlsr has tools to create these functions from expressions, as we shall see here. First we again set up the data and load the package.
require(nlsr)
traceval <- FALSE
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nlxb, nls, wrapnlsr.
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
weedrj <- model2rjfun(modelformula=eunsc, pvec=start1, data=weeddata1)
weedrj
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x5165418>
rjfundoc(weedrj) # Note how useful this is to report status
## FUNCTION weedrj
## Formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## Code: expression({
## .expr3 <- exp(-b3 * tt)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - y
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * tt))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
## Parameters: b1, b2, b3
## Data: y, tt
##
## VALUES
## Observations: 12
## Parameters:
## b1 b2 b3
## 1 1 1
## Data (length 12):
## y tt
## 1 5.308 1
## 2 7.240 2
## 3 9.638 3
## 4 12.866 4
## 5 17.069 5
## 6 23.192 6
## 7 31.443 7
## 8 38.558 8
## 9 50.156 9
## 10 62.948 10
## 11 75.995 11
## 12 91.972 12
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.995, 91.972)
tt <- seq_along(y) # for testing
mydata <- data.frame(y = y, tt = tt)
f <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt))
p <- c(b1 = 1, b2 = 1, b3 = 1)
rjfn <- model2rjfun(f, p, data = mydata)
rjfn(p)
## [1] 4.64386 3.64458 2.25518 0.11562 -2.91533 -7.77921 -14.68094
## [8] -20.35397 -30.41538 -41.57497 -52.89343 -67.04642
## attr(,"gradient")
## b1 b2 b3
## [1,] 9.9519 -8.9615 0.89615
## [2,] 10.8846 -9.6998 1.93997
## [3,] 11.8932 -10.4787 3.14361
## [4,] 12.9816 -11.2964 4.51856
## [5,] 14.1537 -12.1504 6.07520
## [6,] 15.4128 -13.0373 7.82235
## [7,] 16.7621 -13.9524 9.76668
## [8,] 18.2040 -14.8902 11.91213
## [9,] 19.7406 -15.8437 14.25933
## [10,] 21.3730 -16.8050 16.80496
## [11,] 23.1016 -17.7647 19.54122
## [12,] 24.9256 -18.7127 22.45528
myfn <- function(p, resfn=rjfn){
val <- resss(p, resfn)
}
p <- c(b1 = 2, b2=2, b3=1)
a1 <- optim(p, myfn, control=list(trace=0))
a1
## $par
## b1 b2 b3
## 1.9618 4.9092 3.1358
##
## $value
## [1] 2.5873
##
## $counts
## function gradient
## 200 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
We can also embed the function directly.
a2 <- optim(p, function(p,resfn=rjfn){resss(p,resfn)}, control=list(trace=0))
a2
## $par
## b1 b2 b3
## 1.9618 4.9092 3.1358
##
## $value
## [1] 2.5873
##
## $counts
## function gradient
## 200 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
?? from a different vignette, need to integrate
One of the common needs for optimization computations is the availability of accurate gradients of the objective functions. While differentiation is relatively straightforward, it is tedious and error-prone.
At 2015-1-24, I have not determined how to automate the use the output of the derivatives generated by nlsr to create a working gradient function. However, the following write-up shows how such a function can be generated in a semi-automatic way.
We use an example that appeared on the R-help mailing list on Jan 14, 2015. Responses by Ravi Varadhan and others, along with some modification I made, gave the following negative log likelihood function to be minimized.
require(nlsr)
y=c(5,11,21,31,46,75,98,122,145,165,195,224,245,293,321,330,350,420) # data set
Nweibull2 <- function(x,prm){
la <- prm[1]
al <- prm[2]
be<- prm[3]
val2 <- la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
val2
}
LL2J <- function(par,y) {
R = Nweibull2(y,par)
-sum(log(R))
}
We want the gradient of LL3J() with respect to par, and first compute the derivatives of Nweibull2() with respect to the paramters prm
We start with the central expression in Nweibull2() and compute its partial derivatives. The expression is:
la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
# Put in the main expression for the Nweibull pdf.
## we generate the three gradient components
g1n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "la")
g1n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## (al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be +
## la * al * (1 - exp((x/al)^be)))
g2n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "al")
g2n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## (be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) *
## (be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) +
## la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) *
## exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
g3n <- nlsDeriv(~ la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "be")
g3n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## ((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be *
## log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) +
## la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
By copying and pasting the output above into a function structure, we get Nwei2g() below.
Nwei2g <- function(x, prm){
la <- prm[1]
al <- prm[2]
be<- prm[3]
g1v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
(al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be +
la * al * (1 - exp((x/al)^be)))
g2v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
(be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) *
(be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) +
la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) *
exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
g3v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be *
log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) +
la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
gg <- matrix(data=c(g1v, g2v, g3v), ncol=3)
}
We can check this gradient functionusing the grad() function from package numDeriv.
start1 <- c(lambda=.01,alpha=340,beta=.8)
start2 <- c(lambda=.01,alpha=340,beta=.7)
require(numDeriv)
ganwei <- Nwei2g(y, prm=start1)
require(numDeriv)
Nw <- function(x, y) {
Nweibull2(y, x)
} # to allow grad() to work
gnnwei <- matrix(NA, nrow=length(y), ncol=3)
for (i in 1:length(y)){
gnrow <- grad(Nw, x=start1, y=y[i])
gnnwei[i,] <- gnrow
}
gnnwei
## [,1] [,2] [,3]
## [1,] 1.5080091 7.5763e-06 -4.4573e-02
## [2,] 1.0470119 4.3477e-06 -2.1663e-02
## [3,] 0.6473921 1.6572e-06 -7.3707e-03
## [4,] 0.4021585 1.7784e-07 -9.4940e-04
## [5,] 0.1635446 -1.0061e-06 3.5893e-03
## [6,] -0.0815624 -1.6713e-06 6.0571e-03
## [7,] -0.1689654 -1.5386e-06 5.8709e-03
## [8,] -0.2048835 -1.1819e-06 5.0149e-03
## [9,] -0.2077443 -7.9701e-07 4.0222e-03
## [10,] -0.1955391 -4.9171e-07 3.1859e-03
## [11,] -0.1644138 -1.3523e-07 2.1110e-03
## [12,] -0.1297028 8.2595e-08 1.3249e-03
## [13,] -0.1055059 1.7196e-07 9.0488e-04
## [14,] -0.0598567 2.2613e-07 3.1939e-04
## [15,] -0.0406518 2.0242e-07 1.4872e-04
## [16,] -0.0355949 1.9079e-07 1.1187e-04
## [17,] -0.0261120 1.6182e-07 5.3031e-05
## [18,] -0.0075317 6.8245e-08 -1.1995e-05
ganwei
## [,1] [,2] [,3]
## [1,] 1.5080091 7.5763e-06 -4.4573e-02
## [2,] 1.0470119 4.3477e-06 -2.1663e-02
## [3,] 0.6473921 1.6572e-06 -7.3707e-03
## [4,] 0.4021585 1.7784e-07 -9.4940e-04
## [5,] 0.1635446 -1.0061e-06 3.5893e-03
## [6,] -0.0815624 -1.6713e-06 6.0571e-03
## [7,] -0.1689654 -1.5386e-06 5.8709e-03
## [8,] -0.2048835 -1.1819e-06 5.0149e-03
## [9,] -0.2077443 -7.9701e-07 4.0222e-03
## [10,] -0.1955391 -4.9171e-07 3.1859e-03
## [11,] -0.1644138 -1.3523e-07 2.1110e-03
## [12,] -0.1297028 8.2595e-08 1.3249e-03
## [13,] -0.1055059 1.7196e-07 9.0488e-04
## [14,] -0.0598567 2.2613e-07 3.1939e-04
## [15,] -0.0406518 2.0242e-07 1.4872e-04
## [16,] -0.0355949 1.9079e-07 1.1187e-04
## [17,] -0.0261120 1.6182e-07 5.3031e-05
## [18,] -0.0075317 6.8245e-08 -1.1995e-05
cat("max(abs(gnnwei - ganwei))= ", max(abs(gnnwei - ganwei)),"\n")
## max(abs(gnnwei - ganwei))= 2.8041e-11
Now we can build the gradient of the objective function. This requires an application of the chain rule to the summation of logs of the elements of the quantity R. Since the derivative of log(R) w.r.t. R is simply 1/R, this is relatively simple. However, I have not found how to automate this.
## and now we can build the gradient of LL2J
LL2Jg <- function(prm, y) {
R = Nweibull2(y,prm)
gNN <- Nwei2g(y, prm)
# print(str(gNN)
gL <- - as.vector( t(1/R) %*% gNN)
}
# test
gaLL2J <- LL2Jg(start1, y)
gaLL2J
## [1] 3365.78244 -0.01441 -18.63351
gnLL2J <- grad(LL2J, start1, y=y)
gnLL2J
## [1] 3365.78244 -0.01441 -18.63351
cat("max(abs(gaLL2J-gnLL2J))= ", max(abs(gaLL2J-gnLL2J)), "\n" )
## max(abs(gaLL2J-gnLL2J))= 1.8254e-08
These examples show dotargs do NOT work for any of nlsr, nls, or minpack.lm. Use of a dataframe or local (calling) environment objects does work in all.
## @knitr nlsdata.R
# try different ways of supplying data to R nls stuff
ydata <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.995, 91.972)
ttdata <- seq_along(ydata) # for testing
mydata <- data.frame(y = ydata, tt = ttdata)
hobsc <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt))
ste <- c(b1 = 2, b2 = 5, b3 = 3)
# let's try finding the variables
findmainenv <- function(formula, prm) {
vn <- all.vars(formula)
pnames <- names(prm)
ppos <- match(pnames, vn)
datvar <- vn[-ppos]
cat("Data variables:")
print(datvar)
cat("Are the variables present in the current working environment?\n")
for (i in seq_along(datvar)){
cat(datvar[[i]]," : present=",exists(datvar[[i]]),"\n")
}
}
findmainenv(hobsc, ste)
## Data variables:[1] "y" "tt"
## Are the variables present in the current working environment?
## y : present= TRUE
## tt : present= TRUE
y <- ydata
tt <- ttdata
findmainenv(hobsc, ste)
## Data variables:[1] "y" "tt"
## Are the variables present in the current working environment?
## y : present= TRUE
## tt : present= TRUE
rm(y)
rm(tt)
# ===============================
# let's try finding the variables in dotargs
finddotargs <- function(formula, prm, ...) {
dots <- list(...)
cat("dots:")
print(dots)
cat("names in dots:")
dtn <- names(dots)
print(dtn)
vn <- all.vars(formula)
pnames <- names(prm)
ppos <- match(pnames, vn)
datvar <- vn[-ppos]
cat("Data variables:")
print(datvar)
cat("Are the variables present in the dot args?\n")
for (i in seq_along(datvar)){
dname <- datvar[[i]]
cat(dname," : present=",(dname %in% dtn),"\n")
}
}
finddotargs(hobsc, ste, y=ydata, tt=ttdata)
## dots:$y
## [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
##
## $tt
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
##
## names in dots:[1] "y" "tt"
## Data variables:[1] "y" "tt"
## Are the variables present in the dot args?
## y : present= TRUE
## tt : present= TRUE
# ===============================
y <- ydata
tt <- ttdata
tryq <- try(nlsquiet <- nls(formula=hobsc, start=ste))
if (class(tryq) != "try-error") {print(nlsquiet)} else {cat("try-error\n")}
## Nonlinear regression model
## model: y ~ 100 * b1/(1 + 10 * b2 * exp(-0.1 * b3 * tt))
## data: parent.frame()
## b1 b2 b3
## 1.96 4.91 3.14
## residual sum-of-squares: 2.59
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 2.17e-07
#- OK
rm(y)
rm(tt)
tdots<-try(nlsdots <- nls(formula=hobsc, start=ste, y=ydata, tt=ttdata))
if ( class(tdots) != "try-error") {print(nlsdots)} else {cat("try-error\n")}
## try-error
#- Fails
tframe <- try(nlsframe <- nls(formula=hobsc, start=ste, data=mydata))
if (class(tframe) != "try-error") {print(nlsframe)} else {cat("try-error\n")}
## Nonlinear regression model
## model: y ~ 100 * b1/(1 + 10 * b2 * exp(-0.1 * b3 * tt))
## data: mydata
## b1 b2 b3
## 1.96 4.91 3.14
## residual sum-of-squares: 2.59
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 2.17e-07
#- OK
library(nlsr)
y <- ydata
tt <- ttdata
tquiet <- try(nlsrquiet <- nlxb(formula=hobsc, start=ste))
## vn:[1] "y" "b1" "b2" "b3" "tt"
## no weights
if ( class(tquiet) != "try-error") {print(nlsrquiet)}
## nlsr object: x
## residual sumsquares = 2.5873 on 12 observations
## after 5 Jacobian and 6 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 1.96186 0.1131 17.35 3.167e-08 -2.603e-09 130.1
## b2 4.90916 0.1688 29.08 3.284e-10 -2.425e-10 6.165
## b3 3.1357 0.06863 45.69 5.768e-12 1.943e-09 2.735
#- OK
rm(y)
rm(tt)
test <- try(nlsrdots <- nlxb(formula=hobsc, start=ste, y=ydata, tt=ttdata))
if (class(test) != "try-error") { print(nlsrdots) } else {cat("Try error\n") }
## Try error
#- Note -- does NOT work -- do we need to specify the present env. in nlfb for y, tt??
test2 <- try(nlsframe <- nls(formula=hobsc, start=ste, data=mydata))
if (class(test) != "try-error") {print(nlsframe) } else {cat("Try error\n") }
## Try error
#- OK
library(minpack.lm)
y <- ydata
tt <- ttdata
nlsLMquiet <- nlsLM(formula=hobsc, start=ste)
print(nlsLMquiet)
## Nonlinear regression model
## model: y ~ 100 * b1/(1 + 10 * b2 * exp(-0.1 * b3 * tt))
## data: parent.frame()
## b1 b2 b3
## 1.96 4.91 3.14
## residual sum-of-squares: 2.59
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 1.49e-08
#- OK
rm(y)
rm(tt)
## Dotargs
tdots <- try(nlsLMdots <- nlsLM(formula=hobsc, start=ste, y=ydata, tt=ttdata))
if (class(tdots) != "try-error") { print(nlsLMdots) } else {cat("try-error\n") }
## try-error
#- Note -- does NOT work
## dataframe
tframe <- try(nlsLMframe <- nlsLM(formula=hobsc, start=ste, data=mydata) )
if (class(tdots) != "try-error") {print(nlsLMframe)} else {cat("try-error\n") }
## try-error
#- does not work
## detach("package:nlsr", unload=TRUE)
## Uses nlmrt here for comparison
## library(nlmrt)
## txq <- try( nlxbquiet <- nlxb(formula=hobsc, start=ste))
## if (class(txq) != "try-error") {print(nlxbquiet)} else { cat("try-error\n")}
#- Note -- does NOT work
## txdots <- try( nlxbdots <- nlxb(formula=hobsc, start=ste, y=y, tt=tt) )
## if (class(txdots) != "try-error") {print(nlxbdots)} else {cat("try-error\n")}
#- Note -- does NOT work
## dataframe
## nlxbframe <- nlxb(formula=hobsc, start=ste, data=mydata)
## print(nlxbframe)
#- OK
Bates, Douglas M., and Donald G. Watts. 1981. “A Relative Off Set Orthogonality Convergence Criterion for Nonlinear Least Squares.” Technometrics 23 (2): 179–83.
Hartley, H. O. 1961. “The Modified Gauss-Newton Method for Fitting of Nonlinear Regression Functions by Least Squares.” Technometrics 3: 269–80.
Marquardt, Donald W. 1963. “An Algorithm for Least-Squares Estimation of Nonlinear Parameters.” SIAM Journal on Applied Mathematics 11 (2). SIAM: 431–41.
Nash, John C. 1977. “Minimizing a Nonlinear Sum of Squares Function on a Small Computer.” Journal of the Institute for Mathematics and Its Applications 19: 231–37.