read_docx
Use the function read_docx()
to create an R object representing a Word document.
The initial Word file can be specified with the path
argument. If none is provided, this file will be an empty document located in the package directory. Formats and styles are defined in the initial file.
From the initial document, we will be able to use an object containing not only paragraph styles, character styles and table styles of the original document but also its content.
## style_type style_id style_name is_custom
## 1 paragraph Normal Normal FALSE
## 2 paragraph Titre1 heading 1 FALSE
## 3 paragraph Titre2 heading 2 FALSE
## 4 paragraph Titre3 heading 3 FALSE
## 5 character Policepardfaut Default Paragraph Font FALSE
## 6 table TableauNormal Normal Table FALSE
## 7 numbering Aucuneliste No List FALSE
## 8 character strong strong TRUE
## 9 paragraph centered centered TRUE
## 10 table tabletemplate table_template TRUE
## 11 table Listeclaire-Accent2 Light List Accent 2 FALSE
## 12 character Titre1Car Titre 1 Car TRUE
## 13 character Titre2Car Titre 2 Car TRUE
## 14 character Titre3Car Titre 3 Car TRUE
## 15 paragraph graphictitle graphic title TRUE
## 16 paragraph tabletitle table title TRUE
## 17 table Professionnel Table Professional FALSE
## 18 paragraph TM1 toc 1 FALSE
## 19 paragraph TM2 toc 2 FALSE
## 20 paragraph Textedebulles Balloon Text FALSE
## 21 character TextedebullesCar Texte de bulles Car TRUE
## 22 character referenceid reference_id TRUE
## is_default
## 1 TRUE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 TRUE
## 6 TRUE
## 7 TRUE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## 11 FALSE
## 12 FALSE
## 13 FALSE
## 14 FALSE
## 15 FALSE
## 16 FALSE
## 17 FALSE
## 18 FALSE
## 19 FALSE
## 20 FALSE
## 21 FALSE
## 22 FALSE
By default new content is added at the end of the document. To understand how to add content at any location in the document, see the later section about cursor manipulation.
Let’s create an image from a plot…
src <- tempfile(fileext = ".png")
png(filename = src, width = 5, height = 6, units = 'in', res = 300)
barplot(1:10, col = 1:10)
dev.off()
…and add that image to the document along with some new text paragraphs and a table.
my_doc <- my_doc %>%
body_add_img(src = src, width = 5, height = 6, style = "centered") %>%
body_add_par("Hello world!", style = "Normal") %>%
body_add_par("", style = "Normal") %>% # blank paragraph
body_add_table(iris, style = "table_template")
An (updated) Word file can be generated using the print()
function with the target
argument:
Download file first_example.docx - view with office web viewer
There are two types of functions for adding elements.
body_add_*
functions:
slip_in_*
functions:
body_add_*
functionsThe paragraph is the main top container for content within a Word document. Note that tables are top container, they are at the same level as paragraphs. body_add_*
functions are designed to add content as a top container: text as an entire paragraph, table, image, page break…
A title is a paragraph. To add a title, use body_add_par()
with the style
argument pointing to the corresponding title style.
Use the function styles_info()
to see available styles:
## style_type style_id style_name is_custom is_default
## 1 paragraph Normal Normal FALSE TRUE
## 2 paragraph Titre1 heading 1 FALSE FALSE
## 3 paragraph Titre2 heading 2 FALSE FALSE
## 4 paragraph Titre3 heading 3 FALSE FALSE
## 9 paragraph centered centered TRUE FALSE
## 15 paragraph graphictitle graphic title TRUE FALSE
## 16 paragraph tabletitle table title TRUE FALSE
## 18 paragraph TM1 toc 1 FALSE FALSE
## 19 paragraph TM2 toc 2 FALSE FALSE
## 20 paragraph Textedebulles Balloon Text FALSE FALSE
It is important to understand that these style names are read in the initial file provided to
read_docx()
. A few comments:
body_add_gg()
in the following code, using style = "centered"
will set centered paragraph properties (defined as centered in the initial document) to the new paragraph where the plot will be added.body_add_table()
. For advanced tabular formatting, use the flextable
package instead (flextable website). It has a function body_add_flextable()
that can be used with officer
.if( require("ggplot2") ){
gg <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) +
geom_point()
read_docx() %>%
body_add_par(value = "Table of content", style = "heading 1") %>%
body_add_toc(level = 2) %>%
body_add_break() %>%
body_add_par(value = "dataset iris", style = "heading 2") %>%
body_add_table(value = head(iris), style = "table_template" ) %>%
body_add_par(value = "plot examples", style = "heading 1") %>%
body_add_gg(value = gg, style = "centered" ) %>%
print(target = "assets/docx/body_add_demo.docx")
}
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/body_add_demo.docx"
Download file body_add_demo.docx - view with office web viewer
slip_in_*
functionsThe slip_in_*
functions are designed to add content inside an existing paragraph: text, image or seq field. The element is inserted either at the beginning or end of the paragraph (pos = c('after', 'before')
). Available functions are the following:
slip_in_img()
slip_in_seqfield()
slip_in_text()
img.file <- file.path( R.home("doc"), "html", "logo.jpg" )
read_docx() %>%
body_add_par("R logo: ", style = "Normal") %>%
slip_in_img(src = img.file, style = "strong",
width = .3, height = .3, pos = "after") %>%
slip_in_text(" - This is ", style = "strong", pos = "before") %>%
slip_in_seqfield(str = "SEQ Figure \u005C* ARABIC",
style = 'strong', pos = "before") %>%
print(target = "assets/docx/slip_in_demo.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/slip_in_demo.docx"
Download file slip_in_demo.docx - view with office web viewer
These have been implemented mostly to allow the addition of Word’s special sequence fields (which facilitate numbering) at the beginning of paragraphs used as reference entries (e.g. a table or plot caption). See the section Table and image captions.
A cursor is available and can be manipulated so that content can be added relative to its position with the body_add_*
functions:
before
will insert a new element before the selected element in the document.after
will insert a new element after the selected element in the document.on
will replace the selected element in the document by a new element.Cursor functions are the following:
cursor_begin()
cursor_end()
cursor_reach()
cursor_backward()
cursor_forward()
cursor_bookmark()
In order to illustrate the cursor functions, a document made up of several paragraphs will be used (let’s use officer for that).
read_docx() %>%
body_add_par("paragraph 1", style = "Normal") %>%
body_add_par("paragraph 2", style = "Normal") %>%
body_add_par("paragraph 3", style = "Normal") %>%
body_add_par("paragraph 4", style = "Normal") %>%
body_add_par("paragraph 5", style = "Normal") %>%
body_add_par("paragraph 6", style = "Normal") %>%
body_add_par("paragraph 7", style = "Normal") %>%
print(target = "assets/docx/init_doc.docx" )
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/init_doc.docx"
Download file init_doc.docx - view with office web viewer
Now, let’s use init_doc.docx
with read_docx()
and manipulate its content with cursor functions.
doc <- read_docx(path = "assets/docx/init_doc.docx") %>%
# default template contains only an empty paragraph
# Using cursor_begin and body_remove, we can delete it
cursor_begin() %>% body_remove() %>%
# Let add text at the beginning of the
# paragraph containing text "paragraph 4"
cursor_reach(keyword = "paragraph 4") %>%
slip_in_text("This is ", pos = "before", style = "Default Paragraph Font") %>%
# move the cursor forward and end a section
cursor_forward() %>%
body_add_par("The section stop here", style = "Normal") %>%
body_end_section(landscape = TRUE, continuous = FALSE) %>%
# move the cursor at the end of the document
cursor_end() %>%
body_add_par("The document ends now", style = "Normal")
## Warning: body_end_section is deprecated. See ?sections for replacement
## functions.
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/cursor.docx"
Download file cursor.docx - view with office web viewer
The function body_remove()
lets you remove content from a Word document. This function used with cursor_*
functions is a convenient tool to update an existing document.
For illustration purposes, we will generate a document that will be used as an initial document later when showing how to use body_remove()
.
library(officer)
library(magrittr)
str1 <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " %>%
rep(20) %>% paste(collapse = "")
str2 <- "Drop that text"
str3 <- "Aenean venenatis varius elit et fermentum vivamus vehicula. " %>%
rep(20) %>% paste(collapse = "")
my_doc <- read_docx() %>%
body_add_par(value = str1, style = "Normal") %>%
body_add_par(value = str2, style = "centered") %>%
body_add_par(value = str3, style = "Normal")
print(my_doc, target = "assets/docx/ipsum_doc.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/ipsum_doc.docx"
The file ipsum_doc.docx
now exists and contains a paragraph containing text that text. In the following example, we will position the cursor on that paragraph and then delete it:
my_doc <- read_docx(path = "assets/docx/ipsum_doc.docx") %>%
cursor_reach(keyword = "that text") %>%
body_remove()
print(my_doc, target = "assets/docx/ipsum_doc.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/ipsum_doc.docx"
The text search is made via xpath 1.0
and regular expressions are not supported.
Download file ipsum_doc.docx - view with office web viewer
The body_add_*
functions let you replace content in a Word document.
For illustration purposes, we will generate a document that will be used as an initial document later.
my_doc <- read_docx() %>%
body_add_par(value = str1, style = "Normal") %>%
body_add_par(value = str2, style = "centered") %>%
body_add_par(value = str3, style = "Normal")
print(my_doc, target = "assets/docx/replace_template.docx")
The file replace_template.docx
now exists and contains a paragraph containing text that text. In the following example, we will position the cursor on that paragraph and then replace it. Using pos = "on"
will replace content where the cursor is with new content.
my_doc <- read_docx(path = "assets/docx/replace_template.docx") %>%
cursor_reach(keyword = "that text") %>%
body_add_par(value = "This is a new paragraph.", style = "centered", pos = "on")
print(my_doc, target = "assets/docx/replace_doc.docx")
You can also use the body_replace_*
functions to search-and-replace text. body_replace_text_at_bkm()
replaces text at a bookmark:
doc <- read_docx() %>%
body_add_par("centered text", style = "centered") %>%
slip_in_text(". How are you", style = "strong") %>%
body_bookmark("text_to_replace") %>%
body_replace_text_at_bkm("text_to_replace", "not left aligned")
To do the same with headers and footers of the Word document, use functions headers_replace_text_at_bkm
and footers_replace_text_at_bkm
.
body_replace_all_text()
will, depending on the options provided, replace text either at the cursor or in the entire document:
doc <- read_docx() %>%
body_add_par("Placeholder one") %>%
body_add_par("Placeholder two")
# Show text chunk at cursor
docx_show_chunk(doc) # Output is 'Placeholder two'
## 1 text nodes found at this cursor.
## <w:t>: 'Placeholder two'
# Simple search-and-replace at current cursor, with regex turned off
body_replace_all_text(doc, "Placeholder", "new", only_at_cursor = TRUE, fixed=TRUE)
docx_show_chunk(doc) # Output is 'new two'
## 1 text nodes found at this cursor.
## <w:t>: 'new two'
# Do the same, but in the entire document and ignoring case
body_replace_all_text(doc, "placeholder", "new", only_at_cursor = FALSE, ignore.case=TRUE)
cursor_backward(doc)
docx_show_chunk(doc) # Output is 'new one'
## 1 text nodes found at this cursor.
## <w:t>: 'new one'
# Use regex : replace all words starting with "n" with the word "example"
body_replace_all_text(doc, "\\bn.*?\\b", "example")
docx_show_chunk(doc) # Output is 'example one'
## 1 text nodes found at this cursor.
## <w:t>: 'example one'
Download file replace_doc.docx - view with office web viewer
To do the same with headers and footers of the Word document, use functions headers_replace_all_text
and footers_replace_all_text
.
A section starts at the end of the previous section (or the beginning of the document if no preceding section exists). It stops where the section is declared.
Sections can be added to a document by using a set of functions:
body_end_section_landscape()
body_end_section_portrait()
body_end_section_columns()
body_end_section_columns_landscape()
body_end_section_continuous()
To add content into a landscape section, you will need to :
body_end_section_continuous()
.body_end_section_landscape()
.str1 <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. " %>%
rep(5) %>% paste(collapse = "")
str2 <- "Aenean venenatis varius elit et fermentum vivamus vehicula. " %>%
rep(5) %>% paste(collapse = "")
my_doc <- read_docx() %>%
body_add_par(value = str1, style = "centered") %>%
body_end_section_continuous() %>%
body_add_par(value = str2, style = "centered") %>%
body_end_section_landscape()
print(my_doc, target = "assets/docx/landscape_section.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/landscape_section.docx"
Download file landscape_section.docx - view with office web viewer
To add content into a section with columns, you will need to :
body_end_section_continuous()
.body_end_section_columns()
.Function slip_in_column_break()
can be used to add a column break. As it starts a new column, it has to be used on the paragraph where the break happens. By default slip_in_column_break()
insert a column break at the beginning of the paragraph where the cursor is.
my_doc <- read_docx() %>%
body_end_section_continuous() %>%
body_add_par(value = str1, style = "centered") %>%
body_add_par(value = str2, style = "centered") %>%
slip_in_column_break() %>%
body_add_par(value = str2, style = "centered") %>%
body_end_section_columns(widths = c(2,2), sep = TRUE, space = 1)
print(my_doc, target = "assets/docx/columns_section.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/columns_section.docx"
Download file columns_section.docx - view with office web viewer
To add content into a section with columns and landscape orientation, you will need to :
body_end_section_continuous()
.body_end_section_columns_landscape()
.my_doc <- read_docx() %>%
body_end_section_continuous() %>%
body_add_par(value = str1, style = "Normal") %>%
body_add_par(value = str2, style = "Normal") %>%
body_end_section_columns_landscape(widths = c(3,3), sep = TRUE, space = 1)
print(my_doc, target = "assets/docx/columns_landscape_section.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/columns_landscape_section.docx"
Download file columns_landscape_section.docx - view with office web viewer
The following example demonstrate all known usages:
my_doc <- read_docx() %>%
body_add_par(value = "Default section", style = "heading 1") %>%
body_add_par(value = str1, style = "centered") %>%
body_add_par(value = str2, style = "centered") %>%
body_end_section_continuous() %>%
body_add_par(value = "Landscape section", style = "heading 1") %>%
body_add_par(value = str1, style = "centered") %>%
body_add_par(value = str2, style = "centered") %>%
body_end_section_landscape() %>%
body_add_par(value = "Columns", style = "heading 1") %>%
body_end_section_continuous() %>%
body_add_par(value = str1, style = "centered") %>%
body_add_par(value = str2, style = "centered") %>%
slip_in_column_break() %>%
body_add_par(value = str1, style = "centered") %>%
body_end_section_columns(widths = c(2,2), sep = TRUE, space = 1) %>%
body_add_par(value = str1, style = "Normal") %>%
body_add_par(value = str2, style = "Normal") %>%
slip_in_column_break() %>%
body_end_section_columns_landscape(widths = c(3,3), sep = TRUE, space = 1)
print(my_doc, target = "assets/docx/section.docx")
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/section.docx"
Download file section.docx - view with office web viewer
slip_in_seqfield()
and slip_in_text()
can be combined to prefix a paragraph with references (i.e. chapter number and graphic index in the document). However, producing a plot or a table and its caption can be verbose.
Shortcut functions are implemented in the object shortcuts
(it will at least give you a template of code to modify if it does not fit your needs exactly). slip_in_tableref()
, slip_in_plotref()
and body_add_gg()
can make life easier.
Usage of these functions is illustrated below:
library(magrittr)
library(officer)
if( require("ggplot2") ){
gg1 <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) +
geom_point()
gg2 <- ggplot(data = iris, aes(Sepal.Length, Petal.Length, color = Species)) +
geom_point()
doc <- read_docx() %>%
body_add_par(value = "Table of content", style = "heading 1") %>%
body_add_toc(level = 2) %>%
body_add_par(value = "Tables", style = "heading 1") %>%
body_add_par(value = "dataset mtcars", style = "heading 2") %>%
body_add_table(value = head(mtcars)[, 1:4], style = "table_template" ) %>%
body_add_par(value = "data mtcars", style = "table title") %>%
shortcuts$slip_in_tableref(depth = 2) %>%
body_add_par(value = "dataset iris", style = "heading 2") %>%
body_add_table(value = head(iris), style = "table_template" ) %>%
body_add_par(value = "data iris", style = "table title") %>%
shortcuts$slip_in_tableref(depth = 2) %>%
body_end_section(continuous = FALSE, landscape = FALSE ) %>%
body_add_par(value = "plot examples", style = "heading 1") %>%
body_add_gg(value = gg1, style = "centered" ) %>%
body_add_par(value = "graph example 1", style = "graphic title") %>%
shortcuts$slip_in_plotref(depth = 1) %>%
body_add_par(value = "plot 2", style = "heading 2") %>%
body_add_gg(value = gg2, style = "centered" ) %>%
body_add_par(value = "graph example 2", style = "graphic title") %>%
shortcuts$slip_in_plotref(depth = 2) %>%
body_end_section(continuous = FALSE, landscape = TRUE) %>%
body_add_par(value = "Table of tables", style = "heading 2") %>%
body_add_toc(style = "table title") %>%
body_add_par(value = "Table of graphics", style = "heading 2") %>%
body_add_toc(style = "graphic title")
print(doc, target = "assets/docx/toc_and_captions.docx")
}
## Warning: body_end_section is deprecated. See ?sections for replacement
## functions.
## Warning: body_end_section is deprecated. See ?sections for replacement
## functions.
## [1] "/private/var/folders/51/6jygptvs3bb4njv0t6x7br900000gn/T/RtmpfAyl1d/Rbuildbbb0559f13dc/officer/vignettes/assets/docx/toc_and_captions.docx"
Download file toc_and_captions.docx - view with office web viewer