| Title: | Miscellaneous Functions for Data and Geospatial Work |
|---|---|
| Description: | Helpers for common data analysis tasks including missing-value summaries and filters, simple reporting and plotting utilities, 'Excel' import and export workflows, and reading geospatial formats (for example shapefiles in zip archives, file geodatabases, KMZ, and KML) via 'sf' and related packages. Also includes small project utilities such as creating directories, gitignore scaffolding, combined package loading, and optional 'lintr' setup. |
| Authors: | Karlo Guidoni Martins [aut, cre] (ORCID: <https://orcid.org/0000-0002-8458-8467>) |
| Maintainer: | Karlo Guidoni Martins <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.7 |
| Built: | 2026-05-28 14:51:59 UTC |
| Source: | https://github.com/kguidonimartins/misc |
add_gitignore() fetch files using the API from
gitignore.io. Also,
add_gitignore() include tags (created by
ctags) into the gitignore file.
add_gitignore(type = "r")add_gitignore(type = "r")
type |
a character vector with the language to be ignored |
No return value, called for side effects (creates .gitignore, or
stops with an error if the file already exists).
add_gitignore() is inspired by
gitignore::gi_fetch_templates
and by some examples on the gitignore.io
wiki page.
Other project-setup:
create_dirs()
if (interactive()) { # Downloads from gitignore.io (requires network). Use combined `type` on # first create, e.g. `add_gitignore(type = c("r", "python"))`. add_gitignore() }if (interactive()) { # Downloads from gitignore.io (requires network). Use combined `type` on # first create, e.g. `add_gitignore(type = c("r", "python"))`. add_gitignore() }
Reads a spatial file (.zip containing a single shapefile, .shp, .gpkg,
or .geojson), drops Z/M dimensions, replaces non-ASCII characters in every
attribute column, reprojects the geometry to a target CRS and writes the
result to a user-provided output path. The output format is determined by
the extension of output and may differ from the input format (for example,
a .shp can be cleaned and written as .gpkg).
clean_geo(path, output, crs = 4326, encoding = "ISO-8859-1", quiet = FALSE)clean_geo(path, output, crs = 4326, encoding = "ISO-8859-1", quiet = FALSE)
path |
Path to the input spatial file. Must be |
output |
Path to the output file. Required. The extension determines
the output format and must also be one of |
crs |
Target coordinate reference system passed to
|
encoding |
Encoding string used when writing shapefile attribute
tables (passed as |
quiet |
Logical. If |
This function replaces a standalone batch script that cleaned shapefiles
from a client geospatial portal. The non-ASCII replacement step relies on
textclean::replace_non_ascii(); textclean lives in Suggests:, so the
function stops with an informative error if it is not installed.
Invisibly returns the normalized output path (character).
Other geo-io:
read_gdb(),
read_geo(),
read_kmz(),
read_sf_zip()
if (requireNamespace("textclean", quietly = TRUE)) { z <- system.file("extdata", "misc_example.zip", package = "misc") if (nzchar(z) && file.exists(z)) { out <- tempfile(fileext = ".zip") clean_geo(z, out) out_gpkg <- tempfile(fileext = ".gpkg") clean_geo(z, out_gpkg) } }if (requireNamespace("textclean", quietly = TRUE)) { z <- system.file("extdata", "misc_example.zip", package = "misc") if (nzchar(z) && file.exists(z)) { out <- tempfile(fileext = ".zip") clean_geo(z, out) out_gpkg <- tempfile(fileext = ".gpkg") clean_geo(z, out_gpkg) } }
combine_words_ptbr() collapse words using ptbr rules. This function
differ from knitr::combine_words()
which uses oxford commas.
combine_words_ptbr(words, sep = NULL, last = NULL)combine_words_ptbr(words, sep = NULL, last = NULL)
words |
a character vector with words to combine |
sep |
a character with the separator of the words. Default is NULL and insert ", " |
last |
a character vector with the last separator of the words. Default is NULL and insert " e " |
a character vector
combine_words_ptbr() uses transformers
available in the excellent {glue} package
misc::ipak("glue") # using in an ordinary text feira <- c("banana", "maça", "pepino", "ovos") glue("Por favor, compre: {combine_words_ptbr(feira)}")misc::ipak("glue") # using in an ordinary text feira <- c("banana", "maça", "pepino", "ovos") glue("Por favor, compre: {combine_words_ptbr(feira)}")
The main purpose of create_dirs() is to create default directories used
in data science projects. create_dirs() can also create custom
directories.
create_dirs(dirs = NULL)create_dirs(dirs = NULL)
dirs |
a character vector with the directory names. Default is NULL and
create |
No return value, called for side effects (creates directories and
optional .gitkeep placeholder files).
There is a somewhat subjective discussion about the ideal directory structure
for data science projects in general (see
here,
here,
here, and
here). In my humble opinion, the
decision should be made by the user/analyst/scientist/team. Here, I
suggest a directory structure that has worked for me. In addition, the
directory structure created fits perfectly with functions present in this
package (for example save_plot and save_temp_data).
Below is the suggested directory structure:
.
+- R # local functions
+- data
| +- clean # stores clean data
| +- raw # stores raw data (read-only)
| +- temp # stores temporary data
+- output
+- figures # stores figures ready for publication/presentation
+- results # stores text results and others
+- supp # stores supplementary material for publication/presentation
create_dirs() takes advantage of the functions available in the excellent
{fs} package.
Other project-setup:
add_gitignore()
if (interactive()) { # create a single directory create_dirs("myfolder") # create the default directories create_dirs() # see the resulting tree fs::dir_tree() }if (interactive()) { # create a single directory create_dirs("myfolder") # create the default directories create_dirs() # see the resulting tree fs::dir_tree() }
This function removes duplicate rows from a data frame while keeping the first occurrence of each unique combination of the specified grouping variables.
deduplicate_by(.data, ...)deduplicate_by(.data, ...)
.data |
A data frame or tibble |
... |
One or more unquoted variable names to group by |
A data frame with duplicate rows removed, keeping only the first occurrence for each unique combination of grouping variables
Other data-wrangling:
describe_data()
# Remove duplicates based on a single column mtcars %>% deduplicate_by(carb) # Remove duplicates based on multiple columns mtcars %>% deduplicate_by(carb, mpg)# Remove duplicates based on a single column mtcars %>% deduplicate_by(carb) # Remove duplicates based on multiple columns mtcars %>% deduplicate_by(carb, mpg)
Describe data
describe_data(data)describe_data(data)
data |
a data frame |
a skimr object
Other data-wrangling:
deduplicate_by()
nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) nice_data %>% describe_data()nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) nice_data %>% describe_data()
filter_na() just wrap {dplyr} functions in a more
convenient way, IMO.
filter_na(data, type = c("any", "all"))filter_na(data, type = c("any", "all"))
data |
a data frame or tibble |
type |
a character vector indicating which type of NA-filtering must be done. If type = "any",
|
a tibble object
Other missing-data:
na_count(),
na_viz(),
remove_columns_based_on_NA()
nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) nice_data %>% filter_na("all") nice_data %>% filter_na("any")nice_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) nice_data %>% filter_na("all") nice_data %>% filter_na("any")
Transforms both layers to a projected CRS, keeps features in x that touch
the mask y, computes sf::st_intersection(), aggregates clipped area per
identifier, and drops features whose clipped fraction of their original area is
below min_area_ratio.
intersect_mask_filter_area( x, y, x_id = NULL, crs = NULL, min_area_ratio = 0.01, repair = TRUE )intersect_mask_filter_area( x, y, x_id = NULL, crs = NULL, min_area_ratio = 0.01, repair = TRUE )
x |
An sf::sf object with |
y |
An sf::sf mask layer with polygon geometries. |
x_id |
Name of the column in |
crs |
Target projected CRS for area and intersection, from |
min_area_ratio |
Numeric in |
repair |
If |
For each feature in x, area_full is its area before clipping and
area_clip is the sum of areas from intersecting x with y. The ratio
summary$area_ratio is area_clip / area_full: the fraction of each
x feature that falls inside y (not the fraction of y covered by x).
Only polygon geometries are supported for x and y: points and lines
are not meaningful for an area ratio. For example, min_area_ratio = 0.5
retains a feature only when at least
half of its area overlaps the mask; the default 0.01 drops only very small
edge overlaps.
A list with clipped, an sf::sf object with intersection
geometries that passed the threshold, and summary, a dplyr::tibble()
with the ID column, area_full, area_clip, area_ratio, and logical
keep.
Other geo-tools:
quick_map(),
view_mapview_from_path()
ring <- matrix( c(0, 0, 1e6, 0, 1e6, 1e6, 0, 1e6, 0, 0), ncol = 2L, byrow = TRUE ) crs_pl <- sf::st_crs(3857) y <- sf::st_sf(geometry = sf::st_sfc(sf::st_polygon(list(ring)), crs = crs_pl)) inner <- matrix( c(1e5, 1e5, 9e5, 1e5, 9e5, 9e5, 1e5, 9e5, 1e5, 1e5), ncol = 2L, byrow = TRUE ) x <- sf::st_sf( id = "feat_1", geometry = sf::st_sfc(sf::st_polygon(list(inner)), crs = crs_pl) ) out <- intersect_mask_filter_area(x, y, x_id = "id", crs = crs_pl, repair = FALSE) nrow(out$summary)ring <- matrix( c(0, 0, 1e6, 0, 1e6, 1e6, 0, 1e6, 0, 0), ncol = 2L, byrow = TRUE ) crs_pl <- sf::st_crs(3857) y <- sf::st_sf(geometry = sf::st_sfc(sf::st_polygon(list(ring)), crs = crs_pl)) inner <- matrix( c(1e5, 1e5, 9e5, 1e5, 9e5, 9e5, 1e5, 9e5, 1e5, 1e5), ncol = 2L, byrow = TRUE ) x <- sf::st_sf( id = "feat_1", geometry = sf::st_sfc(sf::st_polygon(list(inner)), crs = crs_pl) ) out <- intersect_mask_filter_area(x, y, x_id = "id", crs = crs_pl, repair = FALSE) nrow(out$summary)
Attaches packages that are already installed. Names that are not found on
the library search path are reported with suggested
install.packages() or remotes::install_github() calls to run yourself;
this function does not install packages (CRAN policy).
ipak(pkg_list, force_cran = FALSE, force_github = FALSE)ipak(pkg_list, force_cran = FALSE, force_github = FALSE)
pkg_list |
A character vector of package names. GitHub sources use
|
force_cran |
Logical. Ignored (retained for backwards compatibility; this function does not install or update packages). |
force_github |
Logical. Ignored (retained for backwards compatibility). |
A data.frame with columns pkg_name (character), success
(logical: whether require() attached the package), and version
(character, NA when not loaded). Returned invisibly; summaries are
printed via print() on subsets when rows exist.
ipak() was first developed by
Steven Worthington and made
publicly available
here. This version
only loads packages and suggests install commands for missing ones.
Other package-management:
prefer()
pkg_list <- c("utils", "stats") # base packages — usually present ipak(pkg_list)pkg_list <- c("utils", "stats") # base packages — usually present ipak(pkg_list)
na_count() is a way to display the count and frequency of NA in data. It
can be slow over large datasets.
na_count(data, sort = TRUE)na_count(data, sort = TRUE)
data |
a data frame |
sort |
If |
a long-format tibble
I learned this way of exploring data though the excellent webinar taught by Emily Robinson.
Other missing-data:
filter_na(),
na_viz(),
remove_columns_based_on_NA()
na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) na_data %>% na_count()na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) na_data %>% na_count()
na_viz() create a ggplot plot showing the percentage of NA in each column
na_viz(data)na_viz(data)
data |
a data frame |
a ggplot object
na_viz() is another name for the excellent vis_miss() of
{naniar}
Other missing-data:
filter_na(),
na_count(),
remove_columns_based_on_NA()
if (interactive()) { na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) na_data %>% na_viz() }if (interactive()) { na_data <- data.frame(c1 = c(1, NA), c2 = c(NA, NA)) na_data %>% na_viz() }
The most common conflict between {tidyverse} users is dplyr::filter() and
stats::filter(); among {raster} users, the conflict is with
dplyr::select(). prefer() eliminates conflicts between namespaces by
forcing the use of all the functions of the chosen package, rather than
looking for specific conflicts. Because of that and depending on the number
of functions exported by a package, prefer() can be slow.
prefer(pkg_name, quiet = TRUE)prefer(pkg_name, quiet = TRUE)
pkg_name |
a atomic vector with package names |
quiet |
If warnings should be displayed. Default is TRUE |
No return value, called for side effects (registers conflict
preferences via conflicted::conflict_prefer()).
prefer() is shamelessly derived from a piece of code in
README.md
of the {tidylog}
Other package-management:
ipak()
# prefer `{dplyr}` functions over `{stats}` prefer("dplyr")# prefer `{dplyr}` functions over `{stats}` prefer("dplyr")
quick_map() allows the creation of maps quickly using {ggplot2}. For this
reason, the resulting map is fully editable through {ggplot2} layers.
quick_map(region = NULL, type = NULL)quick_map(region = NULL, type = NULL)
region |
character string or atomic vector containing countries names ou continents. Default is |
type |
character string informing map type. Can be |
a ggplot object
quick_map() depends heavily on the data available by
the {rnaturalearth}
package. In this sense, quick_map() uses a wide and dirty filtering of
this data to create the map.
Other geo-tools:
intersect_mask_filter_area(),
view_mapview_from_path()
if (interactive()) { # plot a world map quick_map() # plot a new world map quick_map(region = "Americas", type = "sf") # using ggplot quick_map(region = "Americas", type = "ggplot") # edit using ggplot2 layers quick_map() + ggplot2::theme_void() + ggplot2::geom_sf(fill = "white") }if (interactive()) { # plot a world map quick_map() # plot a new world map quick_map(region = "Americas", type = "sf") # using ggplot quick_map(region = "Americas", type = "ggplot") # edit using ggplot2 layers quick_map() + ggplot2::theme_void() + ggplot2::geom_sf(fill = "white") }
read_all_sheets_then_save_csv() just loops read_sheet_then_save_csv() over
the available excel sheets and save them in data/temp/extracted_sheets
read_all_sheets_then_save_csv(path_to_xlsx, dir_to_save = NULL)read_all_sheets_then_save_csv(path_to_xlsx, dir_to_save = NULL)
path_to_xlsx |
a character vector with path to the excel file |
dir_to_save |
a character vector with the path to save the csv files. Default is NULL and save the csv files in the "data/temp/extracted_sheets" if it exists. |
A list (one element per sheet), each the return value of
read_sheet_then_save_csv() for that sheet (invisibly NULL per call).
Other excel-import:
read_all_xlsx_then_save_csv(),
read_sheet_then_save_csv()
if (interactive()) { # read and into a csv misc::create_dirs("ma-box") xlsx_file <- system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc") read_all_sheets_then_save_csv( path_to_xlsx = xlsx_file, dir_to_save = "ma-box" ) }if (interactive()) { # read and into a csv misc::create_dirs("ma-box") xlsx_file <- system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc") read_all_sheets_then_save_csv( path_to_xlsx = xlsx_file, dir_to_save = "ma-box" ) }
Following the same principle of read_all_sheets_then_save_csv
read_all_xlsx_then_save_csv() just loop read_all_sheets_then_save_csv() over
all available xlsx files
read_all_xlsx_then_save_csv(path_to_xlsx)read_all_xlsx_then_save_csv(path_to_xlsx)
path_to_xlsx |
a character vector with the path to excel file |
A list (one element per .xlsx file found under path_to_xlsx), each
the list returned by read_all_sheets_then_save_csv() for that workbook.
Other excel-import:
read_all_sheets_then_save_csv(),
read_sheet_then_save_csv()
if (interactive()) { # read and into a csv xlsx_dir <- system.file("xlsx-examples", package = "misc") read_all_xlsx_then_save_csv( path_to_xlsx = xlsx_dir ) }if (interactive()) { # read and into a csv xlsx_dir <- system.file("xlsx-examples", package = "misc") read_all_xlsx_then_save_csv( path_to_xlsx = xlsx_dir ) }
Read layers from a file geodatabase (.gdb)
read_gdb(path, layer = NULL, quiet = TRUE, ...)read_gdb(path, layer = NULL, quiet = TRUE, ...)
path |
Path to a |
layer |
If |
quiet |
Passed to |
... |
Additional arguments passed to |
A tibble with columns fpath (path or GDAL dsn used for the layer),
file_type (tools::file_ext()), layer_name, geometry_type, nrows_aka_features,
ncols_aka_fields, crs_name (from st_layers()$crs when available), and
data (list-column of sf::sf objects). Layers are not row-bound; differing CRS are preserved
per row.
Other geo-io:
clean_geo(),
read_geo(),
read_kmz(),
read_sf_zip()
gdb <- system.file("extdata", "misc_example.gdb", package = "misc") if (nzchar(gdb) && dir.exists(gdb)) { read_gdb(gdb) read_gdb(gdb, layer = "OGRGeoJSON") }gdb <- system.file("extdata", "misc_example.gdb", package = "misc") if (nzchar(gdb) && dir.exists(gdb)) { read_gdb(gdb) read_gdb(gdb, layer = "OGRGeoJSON") }
Chooses the reader from tools::file_ext(path) (case-insensitive):
.zip — read_sf_zip()
.kmz — read_kmz()
.kml — internal KML reader (same tibble layout; fpath is the .kml file)
.gdb — read_gdb()
anything else GDAL/sf can open on path — one row per layer from
sf::st_layers() (e.g. .shp, .gpkg, .geojson)
read_geo(path, layer = NULL, quiet = TRUE, ...)read_geo(path, layer = NULL, quiet = TRUE, ...)
path |
Path to a spatial file or a |
layer |
Passed to multi-layer GDAL readers. Ignored for |
quiet |
Passed to |
... |
Additional arguments passed to |
A tibble as described in read_gdb().
Other geo-io:
clean_geo(),
read_gdb(),
read_kmz(),
read_sf_zip()
d <- system.file("extdata", package = "misc") f <- function(...) file.path(d, ...) if (file.exists(f("misc_example.zip"))) read_geo(f("misc_example.zip")) if (file.exists(f("misc_example.kmz"))) read_geo(f("misc_example.kmz")) if (file.exists(f("misc_example.kml"))) read_geo(f("misc_example.kml")) if (file.exists(f("misc_example.gpkg"))) read_geo(f("misc_example.gpkg")) if (file.exists(f("misc_example.geojson"))) read_geo(f("misc_example.geojson")) if (file.exists(f("misc_example.shp"))) read_geo(f("misc_example.shp")) if (dir.exists(f("misc_example.gdb"))) read_geo(f("misc_example.gdb"), layer = "OGRGeoJSON")d <- system.file("extdata", package = "misc") f <- function(...) file.path(d, ...) if (file.exists(f("misc_example.zip"))) read_geo(f("misc_example.zip")) if (file.exists(f("misc_example.kmz"))) read_geo(f("misc_example.kmz")) if (file.exists(f("misc_example.kml"))) read_geo(f("misc_example.kml")) if (file.exists(f("misc_example.gpkg"))) read_geo(f("misc_example.gpkg")) if (file.exists(f("misc_example.geojson"))) read_geo(f("misc_example.geojson")) if (file.exists(f("misc_example.shp"))) read_geo(f("misc_example.shp")) if (dir.exists(f("misc_example.gdb"))) read_geo(f("misc_example.gdb"), layer = "OGRGeoJSON")
Extracts the archive to a temporary directory and reads each KML layer with
sf::read_sf() after sf::st_layers(). Multiple KML files or multiple
layers yield one row per layer; layer_name is simplified when there is only
one layer in one file.
read_kmz(path, quiet = TRUE, ...)read_kmz(path, quiet = TRUE, ...)
path |
Path to a |
quiet |
Passed to |
... |
Additional arguments passed to |
A tibble with the same columns as read_gdb(). Here fpath is the
path to the original .kmz (not the temporary .kml), and file_type is
typically "kmz". Metadata columns still come from sf::st_layers() on the
extracted KML file used for reading.
Other geo-io:
clean_geo(),
read_gdb(),
read_geo(),
read_sf_zip()
kmz <- system.file("extdata", "misc_example.kmz", package = "misc") if (nzchar(kmz) && file.exists(kmz)) read_kmz(kmz)kmz <- system.file("extdata", "misc_example.kmz", package = "misc") if (nzchar(kmz) && file.exists(kmz)) read_kmz(kmz)
/vsizip/
Uses zip::zip_list() to find .shp members, then reads each with
sf::read_sf() on a /vsizip/... path. Multiple shapefiles become one row
each (list-column data), so differing CRS are not merged.
read_sf_zip(path, quiet = TRUE, ...)read_sf_zip(path, quiet = TRUE, ...)
path |
Path to a |
quiet |
Passed to |
... |
Additional arguments passed to |
A tibble with fpath (the /vsizip/... dsn), file_type, metadata
from sf::st_layers(), and data (list-column of sf). See read_gdb().
Other geo-io:
clean_geo(),
read_gdb(),
read_geo(),
read_kmz()
z <- system.file("extdata", "misc_example.zip", package = "misc") if (nzchar(z) && file.exists(z)) read_sf_zip(z)z <- system.file("extdata", "misc_example.zip", package = "misc") if (nzchar(z) && file.exists(z)) read_sf_zip(z)
read_sheet_then_save_csv() is heavily inspired in readxl::read_excel()
(actually, this inherit almost all argument from it).
read_sheet_then_save_csv( excel_sheet, path_to_xlsx, dir_to_save = NULL, range = NULL, col_types = NULL, col_names = TRUE, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique" )read_sheet_then_save_csv( excel_sheet, path_to_xlsx, dir_to_save = NULL, range = NULL, col_types = NULL, col_names = TRUE, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), .name_repair = "unique" )
excel_sheet |
a character vector with the name of the excel sheet |
path_to_xlsx |
a character vector with the path of the excel file |
dir_to_save |
a character vector with the path to save the csv file. Default is NULL and save the csv in the "data/temp" if it exists. |
range |
A cell range to read from. Includes typical Excel ranges like "B3:D87". |
col_types |
Either NULL to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one col_type is specified, it will be recycled. |
col_names |
TRUE to use the first row as column names |
na |
Character vector of strings to interpret as missing values. By default, treats blank cells as missing data. |
trim_ws |
Should leading and trailing whitespace be trimmed? |
skip |
Minimum number of rows to skip before reading anything, be it column names or data. |
n_max |
Maximum number of data rows to read. |
guess_max |
Maximum number of data rows to use for guessing column types. |
.name_repair |
Handling of column names |
No return value, called for side effects (writes one CSV file for the requested sheet).
read_sheet_then_save_csv() is an adaptation of the awesome workflow described
in an article
from {readxl} package site.
Other excel-import:
read_all_sheets_then_save_csv(),
read_all_xlsx_then_save_csv()
if (interactive()) { # read and into a csv misc::create_dirs("ma-box") xlsx_file <- system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc") read_sheet_then_save_csv( excel_sheet = "mtcars_sheet_001", path_to_xlsx = xlsx_file, dir_to_save = "ma-box" ) }if (interactive()) { # read and into a csv misc::create_dirs("ma-box") xlsx_file <- system.file("xlsx-examples", "mtcars_workbook_001.xlsx", package = "misc") read_sheet_then_save_csv( excel_sheet = "mtcars_sheet_001", path_to_xlsx = xlsx_file, dir_to_save = "ma-box" ) }
Remove columns based on NA values
remove_columns_based_on_NA(data, threshold = 0.5)remove_columns_based_on_NA(data, threshold = 0.5)
data |
A data frame or tibble |
threshold |
The proportion of NA values allowed in a column (default: 0.5) |
A data frame with columns removed if they have more than the specified threshold of NA values
Other missing-data:
filter_na(),
na_count(),
na_viz()
# Create sample data frame with NA values df <- data.frame( a = c(1, 2, NA, 4, 5), b = c(NA, NA, NA, 4, 5), c = c(1, 2, 3, NA, 5) ) # Remove columns with more than 50% NA values remove_columns_based_on_NA(df) # Use stricter threshold of 10% NA values remove_columns_based_on_NA(df, threshold = 0.1)# Create sample data frame with NA values df <- data.frame( a = c(1, 2, NA, 4, 5), b = c(NA, NA, NA, 4, 5), c = c(1, 2, 3, NA, 5) ) # Remove columns with more than 50% NA values remove_columns_based_on_NA(df) # Use stricter threshold of 10% NA values remove_columns_based_on_NA(df, threshold = 0.1)
save_plot() wraps ggplot2::ggsave() and offer option to remove white
spaces around figures (creates a additional file in output/figures/trim;
uses trim_fig)
save_plot( object, filename = NULL, dir_to_save = NULL, width = NA, height = NA, format = NULL, units = NULL, dpi = NULL, overwrite = FALSE, trim = FALSE )save_plot( object, filename = NULL, dir_to_save = NULL, width = NA, height = NA, format = NULL, units = NULL, dpi = NULL, overwrite = FALSE, trim = FALSE )
object |
a ggplot object |
filename |
a character vector with the name of the file to save. Default is NULL and saves with the name of the object |
dir_to_save |
a character vector with the name of the directory to save |
width |
a numerical vector with the width of the figure |
height |
a numerical vector with the height of the figure |
format |
a character vector with format of the figure. Can "jpeg", "tiff", "png" (default), or "pdf" |
units |
a character vector with the units of the figure size. Can be "in", "cm" (default), or "mm" |
dpi |
a numerical vector with the resolution of the figure. Default is 300 |
overwrite |
logical |
trim |
logical |
No return value, called for side effects (writes a graphics file
via ggplot2::ggsave(), and optionally calls trim_fig()).
save_plot() is derived from
write_plot(),
available in the excellent
start project template
Other save-output:
save_temp_data(),
trim_fig()
if (interactive()) { library(misc) ipak(c("ggplot2", "dplyr")) create_dirs() p <- mtcars %>% ggplot() + aes(x = mpg, y = cyl) + geom_point() save_plot(p) }if (interactive()) { library(misc) ipak(c("ggplot2", "dplyr")) create_dirs() p <- mtcars %>% ggplot() + aes(x = mpg, y = cyl) + geom_point() save_plot(p) }
Save object as RDS file
save_temp_data(object, dir_to_save = NULL)save_temp_data(object, dir_to_save = NULL)
object |
R object |
dir_to_save |
a character vector with the directory name. Default is NULL and save object in the "data/temp" if it exists. |
No return value, called for side effects (writes an .rds file
via saveRDS()).
Other save-output:
save_plot(),
trim_fig()
if (interactive()) { # create and save a R object awesome <- "not too much!" misc::create_dirs("ma-box") save_temp_data(object = awesome, dir_to_save = "ma-box") # using default directories from `misc::create_dirs()` create_dirs() so_good <- "Yep!" save_temp_data(object = so_good) # reading many temp data ext <- "\\.[rRdDsS]$" # list files files <- list.files( path = "data/temp", pattern = ext, full.names = TRUE ) # loop over files for (i in files) { # read temporary file tmp <- readRDS(file = i) # remove extension from filename obj_name <- gsub( pattern = ext, replacement = "", x = basename(i) ) # assign name assign(obj_name, tmp) } }if (interactive()) { # create and save a R object awesome <- "not too much!" misc::create_dirs("ma-box") save_temp_data(object = awesome, dir_to_save = "ma-box") # using default directories from `misc::create_dirs()` create_dirs() so_good <- "Yep!" save_temp_data(object = so_good) # reading many temp data ext <- "\\.[rRdDsS]$" # list files files <- list.files( path = "data/temp", pattern = ext, full.names = TRUE ) # loop over files for (i in files) { # read temporary file tmp <- readRDS(file = i) # remove extension from filename obj_name <- gsub( pattern = ext, replacement = "", x = basename(i) ) # assign name assign(obj_name, tmp) } }
tad_view() is an alternative to View() function when not using
RStudio. Please, make sure you have
tad installed in your
system.
tad_view(data)tad_view(data)
data |
a data.frame/tibble data format. |
None
Other data-viewers:
view_excel(),
view_in(),
view_vd(),
view_vd_nonint()
if (interactive()) { library(misc) mtcars %>% tad_view() }if (interactive()) { library(misc) mtcars %>% tad_view() }
trim_fig() just remove white spaces around a figure and save it into the
trim folder (maintain the original figure untouchable)
trim_fig(figure_path, overwrite = FALSE)trim_fig(figure_path, overwrite = FALSE)
figure_path |
a character vector with path of the figure |
overwrite |
logical |
No return value, called for side effects (writes a trimmed image
file under a trim/ subdirectory via magick::image_write()).
trim_fig() wraps the excellent image_trim() of
{magick}
Other save-output:
save_plot(),
save_temp_data()
if (interactive()) { library(misc) ipak(c("ggplot2", "dplyr")) create_dirs() p <- mtcars %>% ggplot() + aes(x = mpg, y = cyl) + geom_point() save_plot(p) trim_fig("output/figures/p.png") }if (interactive()) { library(misc) ipak(c("ggplot2", "dplyr")) create_dirs() p <- mtcars %>% ggplot() + aes(x = mpg, y = cyl) + geom_point() save_plot(p) trim_fig("output/figures/p.png") }
Opens a data frame in Microsoft Excel or another spreadsheet viewer. Also copies the data to the system clipboard.
view_excel(data, viewer = c("excel", "libreoffice", "gnumeric", "tad"))view_excel(data, viewer = c("excel", "libreoffice", "gnumeric", "tad"))
data |
A data frame to view |
viewer |
The spreadsheet viewer to use. One of |
Returns nothing
Other data-viewers:
tad_view(),
view_in(),
view_vd(),
view_vd_nonint()
view_in() is an alternative to View() function when not using
RStudio. To date, it works with gnumeric, libreoffice and tad.
view_in(data, viewer = c("libreoffice", "gnumeric", "tad"))view_in(data, viewer = c("libreoffice", "gnumeric", "tad"))
data |
a data.frame/tibble data format. |
viewer |
character app to open the csv file. |
None
Other data-viewers:
tad_view(),
view_excel(),
view_vd(),
view_vd_nonint()
if (interactive()) { library(misc) mtcars %>% view_in() }if (interactive()) { library(misc) mtcars %>% view_in() }
Reads a spatial data file (.shp or .gpkg) and optionally displays it in an interactive map preview.
macOS only: tabular viewing uses view_vd_nonint(), which is not supported on Windows or Linux;
on those systems the function stops with an error.
view_mapview_from_path(path, preview = FALSE)view_mapview_from_path(path, preview = FALSE)
path |
Path to the spatial data file (.shp or .gpkg) |
preview |
Logical. If TRUE, opens an interactive map preview in the browser. Default is FALSE. |
Requires macOS because the workflow always opens the attribute table with VisiData
via view_vd_nonint(). The function performs the following steps:
Validates that the input file exists and has the correct extension (.shp or .gpkg)
Creates a temporary HTML file for the map preview in ~/.local/share/mapview/
Reads the spatial data using sf::read_sf()
If preview=TRUE, creates an interactive map using mapview and opens it in the browser
Opens the attribute data in VisiData
Returns nothing, called for side effects
Other geo-tools:
intersect_mask_filter_area(),
quick_map()
macOS only. Does not work on Windows or Linux. Opens data in VisiData using the
built-in Terminal.app by default (terminal = "terminal"); use terminal = "auto"
with MISC_VIEW_TERM / options(misc.view_term) for a configurable choice, or
terminal = "iterm" for iTerm2. If the input is an sf object, the geometry column
will be dropped before viewing.
view_vd(data, type = "csv", terminal = c("terminal", "auto", "iterm"))view_vd(data, type = "csv", terminal = c("terminal", "auto", "iterm"))
data |
A data.frame, tibble, or sf object to view |
type |
Either "csv" or "json" format for writing the temporary file. Use "json" for preserving list-columns. |
terminal |
Which macOS terminal to use: |
Platform: Supported only on macOS (Darwin). On Windows or Linux, the function stops with an error. It only performs the VisiData launch in interactive R sessions; in non-interactive sessions it does not open a terminal but still returns the data.
It creates a temporary file and opens it in VisiData in the selected terminal. For
terminal = "auto", set MISC_VIEW_TERM (e.g. in ‘~/.Renviron’) and/or
options(misc.view_term = "iterm") so VisiData opens in iTerm2 when you are inside
tmux or other environments where a fixed default is needed.
The temporary filename includes a timestamp for identification.
The VisiData CLI (vd) must be installed and on your PATH (VisiData is a Python
package; see https://www.visidata.org/). If an executable named vdk is also on
your PATH, it is invoked instead as vdk <project_basename> <file> (a local helper;
not shipped with this package); vd is still required.
Returns the input data invisibly
Other data-viewers:
tad_view(),
view_excel(),
view_in(),
view_vd_nonint()
if (interactive()) { # View a data frame mtcars %>% view_vd() # View with custom title mtcars %>% view_vd(title = "Car Data") # View with list columns preserved nested_df %>% view_vd(type = "json") }if (interactive()) { # View a data frame mtcars %>% view_vd() # View with custom title mtcars %>% view_vd(title = "Car Data") # View with list columns preserved nested_df %>% view_vd(type = "json") }
macOS only. Does not work on Windows or Linux (see view_vd()).
Opens a data frame in VisiData terminal viewer, saving to a fixed location in Downloads.
Similar to view_vd() but without interactive mode check. Uses vdk when on PATH,
otherwise vd (see Details of view_vd()).
view_vd_nonint(data, title = NULL, terminal = c("terminal", "auto", "iterm"))view_vd_nonint(data, title = NULL, terminal = c("terminal", "auto", "iterm"))
data |
A data frame or sf object to view |
title |
Optional title for the viewer window (default: "misc::view_vd") |
terminal |
Which macOS terminal to use; see |
Platform: Supported only on macOS. On Windows or Linux, stops with an error;
see view_vd() for VisiData and terminal requirements.
Returns the input data frame unchanged
Other data-viewers:
tad_view(),
view_excel(),
view_in(),
view_vd()