is a general-purpose client-server framework for high performance
transport of large datasets over network interfaces, built as part of the
Apache Arrow project.
arrow package provides methods for connecting to Flight RPC servers
to send and receive data.
flight functions in the package use
reticulate to call methods in the
pyarrow Python package. Before using them for the first time,
you'll need to be sure you have
reticulate, and you'll also need to
vignette("python", package = "arrow") for more details on setting up
The package includes methods for starting a Python-based Flight server, as well as methods for connecting to a Flight server running elsewhere.
To illustrate both sides, in one process let's start a demo server:
library(arrow) demo_server <- load_flight_server("demo_flight_server") server <- demo_server$DemoFlightServer(port = 8089) server$serve()
We'll leave that one running.
In a different R process, let's connect to it and put some data in it.
library(arrow) client <- flight_connect(port = 8089) # Upload some data to our server so there's something to demo flight_put(client, iris, path = "test_data/iris")
Now, in a new R process, let's connect to the server and pull the data we put there:
library(arrow) library(dplyr) client <- flight_connect(port = 8089) client %>% flight_get("test_data/iris") %>% group_by(Species) %>% summarize(max_petal = max(Petal.Length)) ## # A tibble: 3 x 2 ## Species max_petal ## <fct> <dbl> ## 1 setosa 1.9 ## 2 versicolor 5.1 ## 3 virginica 6.9
flight_get() returns an Arrow data structure, we can directly pipe
its result into a