This tutorial illustrates applications of optimal univariate clustering function Ckmeans.1d.dp. It clusters univariate data given the number of clusters \(k\). It can estimate \(k\) if not provided. It can also perform optimal weighted clustering when a weight vector is provided with the input univariate data. Weighted clustering can be used to analyze 1-D signals such as time series data. The corresponding clusters obtained from weighted clustering can be the basis for optimal time course segmentation or optimal peak calling.

1. Cluster univariate data given the number of clusters \(k\)

Example 1.

Cluster data generated from a Gaussian mixture model of three components.

The number of clusters is provided.

require(Ckmeans.1d.dp)
## Loading required package: Ckmeans.1d.dp
x <- c(rnorm(50, sd=0.3), rnorm(50, mean=1, sd=0.3), rnorm(50, mean=2, sd=0.3))
# Divide x into 3 clusters
k <- 3
result <- Ckmeans.1d.dp(x, k)
plot(result)

plot of chunk unnamed-chunk-1

plot(x, col=result$cluster, pch=result$cluster, cex=1.5,
     main="Optimal univariate clustering given k",
     sub=paste("Number of clusters given:", k))
abline(h=result$centers, col=1:k, lty="dashed", lwd=2)
legend("bottomright", paste("Cluster", 1:k), col=1:k, pch=1:k, cex=1.5, bty="n")

plot of chunk unnamed-chunk-1

2. Cluster univariate data with an estimated number of clusters \(k\)

Example 2.

Cluster data generated from a Gaussian mixture model of three components. The number of clusters is determined by Bayesian information criterion:

require(Ckmeans.1d.dp)
x <- c(rnorm(50, mean=-1, sd=0.3), rnorm(50, mean=1, sd=1), rnorm(50, mean=2, sd=0.4))
# Divide x into k clusters, k automatically selected (default: 1~9)
result <- Ckmeans.1d.dp(x)
plot(result)

plot of chunk unnamed-chunk-2

k <- max(result$cluster)
plot(x, col=result$cluster, pch=result$cluster, cex=1.5,
     main="Optimal univariate clustering with k estimated",
     sub=paste("Number of clusters is estimated to be", k))
abline(h=result$centers, col=1:k, lty="dashed", lwd=2)
legend("topleft", paste("Cluster", 1:k), col=1:k, pch=1:k, cex=1.5, bty="n")

plot of chunk unnamed-chunk-2

3. Optimal weighted univariate clustering for time series analysis

Example 3.

We segment a time course to identify peaks using weighted clustering. The input data is the time stamp of obtaining each intensity measurement; the weight is the signal intensity.

require(Ckmeans.1d.dp)
n <- 160
t <- seq(0, 2*pi*2, length=n)
n1 <- 1:(n/2)
n2 <- (max(n1)+1):n
y1 <- abs(sin(1.5*t[n1]) + 0.1*rnorm(length(n1)))
y2 <- abs(sin(0.5*t[n2]) + 0.1*rnorm(length(n2)))
y <- c(y1, y2)

w <- y^8 # stress the peaks 
res <- Ckmeans.1d.dp(t, k=c(1:10), w)
plot(res)

plot of chunk unnamed-chunk-3

plot(t, w, main = "Time course segmentation / peak calling", 
     col=res$cluster, pch=res$cluster, type="h", 
     xlab="Time t", ylab="Transformed intensity w")
abline(v=res$centers, col="chocolate", lty="dashed")
text(res$centers, max(w) * .95, cex=0.75, font=2,
     paste(round(res$size / sum(res$size) * 100), "/ 100"))

plot of chunk unnamed-chunk-3