The framework for estimating the probability of informed trading (\(\operatorname{PIN}\)) was first established by Easley et al. (1996) (EKOP) and extended in the paper by Easley, Hvidkjaer, and O’Hara (2002) (EHO). Both models assume constant arrival rates for buys and sells as well as constant probabilities of the trading days’ condition. In the EKOP and EHO setting, trading days can reside in three different states: no-news, good-news and bad-news. The probability of informed trading is estimated using daily aggregates of buy and sell orders, whereat buys and sells are supposed to follow independent latent Poisson point processes. The static models distinguish between uninformed and informed trading intensities. In the EKOP setting uninformed buyer and seller participate in the market with identical intensities. However, the EHO setup relaxes this assumption and the expected number of uninformed buys and sells per day are unique. This model structure leads to the EKOP model being nested in the EHO model.

\(\operatorname{PIN}\) is a widely used measure in many empirical applications. Henry (2006) investigates the relationship between short selling and information-based trading, The connection between investor protection, adverse selection and \(\operatorname{PIN}\) is analysed by Brockman and Chung (2008). How the probability of informed trading influences herding is studied in the work by Zhou and Lai (2009). Aslan et al. (2011) employ \(\operatorname{PIN}\) to investigate the linkage of microstructure, accounting, and asset pricing and intend to determine firms which have high information risk. Seasonality of \(\operatorname{PIN}\) estimates are examined in the work by Kang (2010). Additionally, various papers link the probability of informed trading to illiquidity measures, e.g. Duarte and Young (2009) and Li et al. (2009), and bid-ask spreads, e.g. Lei and Wu (2005) and Chung and Li (2003).

Due to the widespread usage of the \(\operatorname{PIN}\) measure in the literature, many researchers focussed on analysing its (technical) properties in detail. Recently, several papers were published proposing improvements in the estimation of model parameters and the probability of informed trading. The original factorizations of (log) likelihood functions in static \(\operatorname{PIN}\) models are very inefficient in terms of
stability and execution time. Furthermore, the probability of informed trading can only be estimated for ancient trading data or very infrequently traded stocks. Easley, Hvidkjaer, and O’Hara (2010) present a more robust formulation of the likelihood function which reduces the occurrence of over- and underflow errors for moderately traded equities. The most recent likelihood factorization for the \(\operatorname{PIN}\) framework assuming static arrival rates by Lin and Ke (2011) can even handle daily buys and sells data of very heavily traded stocks and increases speed and accuracy of function evaluations. In addition, Lin and Ke (2011) showed by simulation that the factorization by Easley, Hvidkjaer, and O’Hara (2010) is based if used with high numbers of daily buys and sells. Hence, all publications incorporating this formulation of the model’s likelihood function may exhibit biased estimates of the probability of informed trading.

Yan and Zhang (2012), Gan, Chun, and Johnstone (2015) and Ersan and Alıcı (2016) study the generation of appropriate initial values for the optimization routine in static \(\operatorname{PIN}\) models. A brute force grid search technique which delivers several sets of starting values is established by Yan and Zhang (2012). Despite its simplicity this method is very time-consuming. The proposed methodologies by Gan, Chun, and Johnstone (2015) and Ersan and Alıcı (2016) harness hierarchical agglomerative clustering (HAC) to determine initial choices for the model parameters.

The pinbasic package ships utilities for fast and stable estimation of the probability of informed trading in the static \(\operatorname{PIN}\) framework. The function design is chosen to fit the extended EHO model setup but can also be applied to the simpler EKOP model by equating the intensities of uninformed buys and sells. State-of-the-art factorization of the model likelihood function as well as most recent algorithms for generating initial values for optimization routines are implemented. Likelihood functions are evaluated with pin_ll and sets of starting values are returned by initial_vals. The probability of informed trading can be estimated for arbitrary length of daily buys and sells data with pin_est which is a wrapper around the workhorse function pin_est_core. No information about the time span of the underlying data is required to perform optimizations with pin_est. However, the recommendation given in the literature is using at least data for 60 trading days to ensure convergence of the likelihood maximization (e.g. see Easley et al. 1996, 1416). Quarterly estimates are returned by qpin which can be visualized with ggplot. Datasets of daily aggregated numbers of buys and sells can be simulated with simulateBS. Calculation of confidence intervals for the probability of informed trading can be enabled by confint argument in optimization routines (pin_est_core, pin_est and qpin) or by calling pin_confint directly. Additionally, posterior probabilities for conditions of trading days can be computed with posterior and plotted with ggplot.

The remainder of this work is structured as follows:
The second chapter examines the general framework of models for the probability of informed trading in more detail. Properties of the extended \(\operatorname{PIN}\) model by Easley, Hvidkjaer, and O’Hara (2002) are discussed in the third section. Stable factorizations for the likelihood function and algorithms for generating reliable sets of initial values are presented in the fourth and fifth section. Some examples of the pinbasic functionalities are given in the last section.

General PIN Framework

In the sequential microstructure models for estimating the probability of informed trading the exchange of equities takes place over \(d= 1, \dots, D\) pairwise independent trading days. No market activities are permitted in which a risk-neutral and competitive market maker is not involved. The market maker determines and updates the bid and ask prices utilizing the information he gathered so far for a trading day. Trading with the market maker is possible at every timestamp \(t\) during regular market hours starting at \(t_{0,m}\) and ending at \(T_m\), i.e. \(t \in \left[t_{0,m},T_m\right]\) with finite \(T_m\). The beginning of official trading may vary depending on the chosen bourse \(m\), i.e. the New York Stock Exchange starts regular trading at 9:30 am, whereas the German electronic marketplace XETRA opens earlier at 9:00 am. Likewise, the upper bound \(T_m\) of the official trading interval may also vary according to the marketplace under consideration. Each trading day can reside in one of three possible states of the set \(Q = \{\mathcal{N}, \mathcal{G}, \mathcal{B}\}\). The elements of the set \(Q\), which represent the conditions of trading days, are no-news (\(\mathcal{N}\)), good-news (\(\mathcal{G}\)) and bad-news (\(\mathcal{B}\)). Trading days on which private information influence the market activities are called information events.

Market participants can be split in two disjoint groups, informed and uninformed traders. Traders holding private information are solely active on information events. In addition, they are assumed to be risk neutral and competitive. They buy (sell) if positive (negative) signals hit the market, which is the case on good-news (bad-news) trading days. The contrary group of traders, the uninformed market attendees, are active on every trading day for various reasons (diversification, liquidity reasons, \(\dots\)).