Abstract
The DBI package defines the generic DataBase Interface for R. The connection to individual DBMS is provided by other packages that import DBI (so-called DBI backends). This document formalizes the behavior expected by the methods declared in DBI and implemented by the individual backends. To ensure maximum portability and exchangeability, and to reduce the effort for implementing a new DBI backend, the DBItest package defines a comprehensive set of test cases that test conformance to the DBI specification. This document is derived from comments in the test definitions of the DBItest package. Any extensions or updates to the tests will be reflected in this document.
DBI defines an interface for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations (so-called DBI backends).
A DBI backend is an R package which imports the DBI and methods packages. For better or worse, the names of many existing backends start with ‘R’, e.g., RSQLite, RMySQL, RSQLServer; it is up to the backend author to adopt this convention or not.
A backend defines three classes, which are subclasses of DBIDriver, DBIConnection, and DBIResult. The backend provides implementation for all methods of these base classes that are defined but not implemented by DBI. All methods have an ellipsis ...
in their formals.
The backend must support creation of an instance of its DBIDriver subclass with a constructor function. By default, its name is the package name without the leading ‘R’ (if it exists), e.g., SQLite
for the RSQLite package. However, backend authors may choose a different name. The constructor must be exported, and it must be a function that is callable without arguments. DBI recommends to define a constructor with an empty argument list.
RSQLite::SQLite()
This section describes the behavior of the following method:
dbDataType(dbObj, obj, ...)
Returns an SQL string that describes the SQL data type to be used for an object. The default implementation of this generic determines the SQL type of an R object according to the SQL 92 specification, which may serve as a starting point for driver implementations. DBI also provides an implementation for data.frame which will return a character vector giving the type for each column in the dataframe.
dbObj
|
A object inheriting from DBIDriver or DBIConnection |
obj
|
An R object whose SQL type we want to determine. |
...
|
Other arguments passed on to methods. |
The data types supported by databases are different than the data types in R, but the mapping between the primitive types is straightforward:
Any of the many fixed and varying length character types are mapped to character vectors
Fixed-precision (non-IEEE) numbers are mapped into either numeric or integer vectors.
Notice that many DBMS do not follow IEEE arithmetic, so there are potential problems with under/overflows and loss of precision.
dbDataType()
returns the SQL type that corresponds to the obj
argument as a non-empty character string. For data frames, a character vector with one element per column is returned. An error is raised for invalid values for the obj
argument such as a NULL
value.
The backend can override the dbDataType()
generic for its driver class.
This generic expects an arbitrary object as second argument. To query the values returned by the default implementation, run example(dbDataType, package = "DBI")
. If the backend needs to override this generic, it must accept all basic R data types as its second argument, namely logical, integer, numeric, character, dates (see Dates), date-time (see DateTimeClasses), and difftime. If the database supports blobs, this method also must accept lists of raw vectors, and blob::blob objects. As-is objects (i.e., wrapped by I()
) must be supported and return the same results as their unwrapped counterparts. The SQL data type for factor and ordered is the same as for character. The behavior for other object types is not specified.
All data types returned by dbDataType()
are usable in an SQL statement of the form "CREATE TABLE test (a ...)"
.
dbDataType(ANSI(), 1:5) dbDataType(ANSI(), 1) dbDataType(ANSI(), TRUE) dbDataType(ANSI(), Sys.Date()) dbDataType(ANSI(), Sys.time()) dbDataType(ANSI(), Sys.time() - as.POSIXct(Sys.Date())) dbDataType(ANSI(), c("x", "abc")) dbDataType(ANSI(), list(raw(10), raw(20))) dbDataType(ANSI(), I(3)) dbDataType(ANSI(), iris) con <- dbConnect(RSQLite::SQLite(), ":memory:") dbDataType(con, 1:5) dbDataType(con, 1) dbDataType(con, TRUE) dbDataType(con, Sys.Date()) dbDataType(con, Sys.time()) dbDataType(con, Sys.time() - as.POSIXct(Sys.Date())) dbDataType(con, c("x", "abc")) dbDataType(con, list(raw(10), raw(20))) dbDataType(con, I(3)) dbDataType(con, iris) dbDisconnect(con)
This section describes the behavior of the following method:
dbConnect(drv, ...)
Connect to a DBMS going through the appropriate authentication procedure. Some implementations may allow you to have multiple connections open, so you may invoke this function repeatedly assigning its output to different objects. The authentication mechanism is left unspecified, so check the documentation of individual drivers for details.
drv
|
an object that inherits from DBIDriver, or an existing DBIConnection object (in order to clone an existing connection). |
...
|
authentication arguments needed by the DBMS instance; these typically include |
dbConnect()
returns an S4 object that inherits from DBIConnection. This object is used to communicate with the database engine.
DBI recommends using the following argument names for authentication parameters, with NULL
default:
user
for the user name (default: current user)
password
for the password
host
for the host name (default: local connection)
port
for the port number (default: local connection)
dbname
for the name of the database on the host, or the database file name
The defaults should provide reasonable behavior, in particular a local connection for host = NULL
. For some DBMS (e.g., PostgreSQL), this is different to a TCP/IP connection to localhost
.
# SQLite only needs a path to the database. (Here, ":memory:" is a special # path that creates an in-memory database.) Other database drivers # will require more details (like user, password, host, port, etc.) con <- dbConnect(RSQLite::SQLite(), ":memory:") con dbListTables(con) dbDisconnect(con)
This section describes the behavior of the following method:
dbDisconnect(conn, ...)
This closes the connection, discards all pending work, and frees resources (e.g., memory, sockets).
conn
|
A DBIConnection object, as returned by |
...
|
Other parameters passed on to methods. |
dbDisconnect()
returns TRUE
, invisibly.
A warning is issued on garbage collection when a connection has been released without calling dbDisconnect()
. A warning is issued immediately when calling dbDisconnect()
on an already disconnected or invalid connection.
con <- dbConnect(RSQLite::SQLite(), ":memory:") dbDisconnect(con)
This section describes the behavior of the following method:
dbSendQuery(conn, statement, ...)
The dbSendQuery()
method only submits and synchronously executes the SQL query to the database engine. It does not extract any records — for that you need to use the dbFetch()
method, and then you must call dbClearResult()
when you finish fetching the records you need. For interactive use, you should almost always prefer dbGetQuery()
.
conn
|
A DBIConnection object, as returned by |
statement
|
a character string containing SQL. |
...
|
Other parameters passed on to methods. |
This method is for SELECT
queries only. Some backends may support data manipulation queries through this method for compatibility reasons. However, callers are strongly encouraged to use dbSendStatement()
for data manipulation statements.
The query is submitted to the database server and the DBMS executes it, possibly generating vast amounts of data. Where these data live is driver-specific: some drivers may choose to leave the output on the server and transfer them piecemeal to R, others may transfer all the data to the client – but not necessarily to the memory that R manages. See individual drivers' dbSendQuery()
documentation for details.
dbSendQuery()
returns an S4 object that inherits from DBIResult. The result set can be used with dbFetch()
to extract records. Once you have finished using a result, make sure to clear it with dbClearResult()
. An error is raised when issuing a query over a closed or invalid connection, if the syntax of the query is invalid, or if the query is not a non-NA
string.
No warnings occur under normal conditions. When done, the DBIResult object must be cleared with a call to dbClearResult()
. Failure to clear the result set leads to a warning when the connection is closed.
If the backend supports only one open result set per connection, issuing a second query invalidates an already open result set and raises a warning. The newly opened result set is valid and must be cleared with dbClearResult()
.
con <- dbConnect(RSQLite::SQLite(), ":memory:") dbWriteTable(con, "mtcars", mtcars) rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4;") dbFetch(rs) dbClearResult(rs) dbDisconnect(con)
This section describes the behavior of the following methods:
dbFetch(res, n = -1, ...) fetch(res, n = -1, ...)
Fetch the next n
elements (rows) from the result set and return them as a data.frame.
res
|
An object inheriting from DBIResult, created by |
n
|
maximum number of records to retrieve per fetch. Use |
...
|
Other arguments passed on to methods. |
fetch()
is provided for compatibility with older DBI clients - for all new code you are strongly encouraged to use dbFetch()
. The default implementation for dbFetch()
calls fetch()
so that it is compatible with existing code. Modern backends should implement for dbFetch()
only.
dbFetch()
always returns a data.frame with as many rows as records were fetched and as many columns as fields in the result set, even if the result is a single value or has one or zero rows. An attempt to fetch from a closed result set raises an error. If the n
argument is not an atomic whole number greater or equal to -1 or Inf, an error is raised, but a subsequent call to dbFetch()
with proper n
argument succeeds. Calling dbFetch()
on a result set from a data manipulation query created by dbSendStatement()
can be fetched and return an empty data frame, with a warning.
Fetching multi-row queries with one or more columns be default returns the entire result. Multi-row queries can also be fetched progressively by passing a whole number (integer or numeric) as the n
argument. A value of Inf for the n
argument is supported and also returns the full result. If more rows than available are fetched, the result is returned in full without warning. If fewer rows than requested are returned, further fetches will return a data frame with zero rows. If zero rows are fetched, the columns of the data frame are still fully typed. Fetching fewer rows than available is permitted, no warning is issued when clearing the result set.
A column named row_names
is treated like any other column.
The column types of the returned data frame depend on the data returned:
integer for integer values between -2^31 and 2^31 - 1
numeric for numbers with a fractional component
logical for Boolean values (some backends may return an integer)
character for text
lists of raw for blobs (with NULL
entries for SQL NULL values)
coercible using as.Date()
for dates (also applies to the return value of the SQL function current_date
)
coercible using hms::as.hms()
for times (also applies to the return value of the SQL function current_time
)
coercible using as.POSIXct()
for timestamps (also applies to the return value of the SQL function current_timestamp
)
NA for SQL NULL
values
If dates and timestamps are supported by the backend, the following R types are used:
Date for dates (also applies to the return value of the SQL function current_date
)
POSIXct for timestamps (also applies to the return value of the SQL function current_timestamp
)
R has no built-in type with lossless support for the full range of 64-bit or larger integers. If 64-bit integers are returned from a query, the following rules apply:
Values are returned in a container with support for the full range of valid 64-bit values (such as the integer64
class of the bit64 package)
Coercion to numeric always returns a number that is as close as possible to the true value
Loss of precision when converting to numeric gives a warning
Conversion to character always returns a lossless decimal representation of the data
con <- dbConnect(RSQLite::SQLite(), ":memory:") dbWriteTable(con, "mtcars", mtcars) # Fetch all results rs <- dbSendQuery(con, "SELECT * FROM mtcars WHERE cyl = 4") dbFetch(rs) dbClearResult(rs) # Fetch in chunks rs <- dbSendQuery(con, "SELECT * FROM mtcars") while (!dbHasCompleted(rs)) { chunk <- dbFetch(rs, 10) print(nrow(chunk)) } dbClearResult(rs) dbDisconnect(con)
This section describes the behavior of the following method:
dbClearResult(res, ...)
Frees all resources (local and remote) associated with a result set. In some cases (e.g., very large result sets) this can be a critical step to avoid exhausting resources (memory, file descriptors, etc.)
res
|
An object inheriting from DBIResult. |
...
|
Other arguments passed on to methods. |
dbClearResult()
returns TRUE
, invisibly, for result sets obtained from both dbSendQuery()
and dbSendStatement()
. An attempt to close an already closed result set issues a warning in both cases.
dbClearResult()
frees all resources associated with retrieving the result of a query or update operation. The DBI backend can expect a call to dbClearResult()
for each dbSendQuery()
or dbSendStatement()
call.
con <- dbConnect(RSQLite::SQLite(), ":memory:") rs <- dbSendQuery(con, "SELECT 1") print(dbFetch(rs)) dbClearResult(rs) dbDisconnect(con)
This section describes the behavior of the following method:
dbBind(res, params, ...)
For parametrized or prepared statements, the dbSendQuery()
and dbSendStatement()
functions can be called with statements that contain placeholders for values. The dbBind()
function binds these placeholders to actual values, and is intended to be called on the result set before calling dbFetch()
or dbGetRowsAffected()
.
res
|
An object inheriting from DBIResult. |
params
|
A list of bindings, named or unnamed. |
...
|
Other arguments passed on to methods. |
DBI supports parametrized (or prepared) queries and statements via the dbBind()
generic. Parametrized queries are different from normal queries in that they allow an arbitrary number of placeholders, which are later substituted by actual values. Parametrized queries (and statements) serve two p