ggfreqScatter {Hmisc}R Documentation

Frequency Scatterplot

Description

Uses ggplot2 to plot a scatterplot or dot-like chart for the case where there is a very large number of overlapping values. This works for continuous and categorical x and y. For continuous variables it serves the same purpose as hexagonal binning. Counts for overlapping points are grouped into quantile groups and level of transparency and rainbow colors are used to provide count information.

The result can also be passed to ggplotly. Actual cell frequencies are added to the hover text in that case.

Usage

ggfreqScatter(x, y, bins=50, g=10,
              xtrans = function(x) x,
              ytrans = function(y) y,
              xbreaks = pretty(x, 10),
              ybreaks = pretty(y, 10),
              xminor  = NULL, yminor = NULL,
              xlab = as.character(substitute(x)),
              ylab = as.character(substitute(y)),
              fcolors = viridis::viridis(10), nsize=FALSE, html=FALSE, ...)

Arguments

x

x-variable

y

y-variable

bins

for continuous x or y is the number of bins to create by rounding. Ignored for categorical variables. If a 2-vector, the first element corresponds to x and the second to y.

g

number of quantile groups to make for frequency counts. Use g=0 to use frequencies continuously for color and alpha coding. This is recommended only when using plotly.

xtrans,ytrans

functions specifying transformations to be made before binning and plotting

xbreaks,ybreaks

vectors of values to label on axis, on original scale

xminor,yminor

values at which to put minor tick marks, on original scale

xlab,ylab

axis labels. If not specified and variable has a label, that label will be used.

fcolors

colors argument to pass to scale_color_gradientn to color code frequencies

nsize

set to TRUE to not vary color or transparency but instead to size the symbols in relation to the number of points. Best with both x and y are discrete. ggplot2 size is taken as the fourth root of the frequency. If there are 15 or unique frequencies all the unique frequencies are used, otherwise g quantile groups of frequencies are used.

html

set to TRUE to use html in axis labels instead of plotmath

...

arguments to pass to geom_point such as shape and size

Value

a ggplot object

Author(s)

Frank Harrell

See Also

cut2

Examples

set.seed(1)
x <- rnorm(1000)
y <- rnorm(1000)
count <- sample(1:100, 1000, TRUE)
x <- rep(x, count)
y <- rep(y, count)
g <- ggfreqScatter(x, y) +   # might add g=0 if using plotly
      ggtitle("Using Deciles of Frequency Counts, 2500 Bins")
g
# plotly::ggplotly(g, tooltip='label')  # use plotly, hover text = freq. only
# Plotly makes it somewhat interactive, with hover text tooltips

# Try with x categorical
x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE)
ggfreqScatter(x1, y)

# Try with y categorical
y1 <- sample(LETTERS[1:10], length(x), TRUE)
ggfreqScatter(x, y1)

# Both categorical, larger point symbols, box instead of circle
ggfreqScatter(x1, y1, shape=15, size=7)
# Vary box size instead
ggfreqScatter(x1, y1, nsize=TRUE, shape=15)

[Package Hmisc version 4.0-0 Index]