5 Static Graphics with mplot3
.:rtemis 0.8.0: Welcome, egenn [x86_64-apple-darwin17.0 (64-bit): Defaulting to 4/4 available cores] Documentation & vignettes: https://rtemis.netlify.com
Visualization is a central part of any data analysis pipeline. Ideally, you want to visualize data before and after any / most operations. Depending on the kind and amount of data you are working on, this can range from straightforward to quite challening. Here, we introduce some data visualization functions which are created using base R graphics. Some advantages of using base graphics are:
- They are easy to extend if you are familiar with base graphics / combine their output with that of other functions using base graphics
- They are very fast to draw. This becomes particularly important when monitoring learning algorithms live, or building shiny applications.
High-dimensional data can sometimes be indirectly visualized after dimensionality reduction.
5.1 Density and Histograms
We can also directly plot grouped data by inputing a list. Note that partial matching allows us to just use
"d" for type:
5.2 Scatter plots
Here we are going to look at the static
mplot3.xym, and the interactive
Some synthetic data:
We create some synthetic data and plot using
mplot3.xy. We can ask for any supervised learner to be used to fit the data. For linear relationships, that would be
glm, for non-linear fits there are many options, but
gam is a great one.
mplot3.xy allows you to easily group data in a few different ways.
You pass x or y or both as a list of vectors:
Or you can use the
group argument, which will accept either a variable name, if
data is defined, or a factor vector:
This extension of
mplot3.xy adds marginal density / histogram plots to a scatter plot:
5.2.3 Fit custom functions
mplot3.xy includes a formula argument as an alternative to fit.
This allows the user to define the formula of the fitting function, if that is known.
As an example, let’s look at power curves.
Power curves can help us model a number of important relationships that occur in nature.
Let’s see how we can plot these in rtemis.
22.214.171.124 y = b * m ^ x
First, we create some synthetic data:
Let’s plot the data:
Now, let’s add a fit line. There are two ways to add a fit line in
fit = 'glm'
formula = y ~ a * x + b
In this case, a linear model (both
'glm' work) is not a good idea:
A generalized additive model (GAM) is our best bet if we know nothing about the relationship between
fit, is the third argument to
mplot3.xy, so we can skip naming it)
Even better, if we do know the type of relationship between
y, we can provide a formula. This will be solved using the Nonlinear Least Squares learner (
We can plot the true function along with the fit.
[2020-06-23 08:18:03 s.NLS] Hello, egenn [[ Regression Input Summary ]] Training features: 200 x 1 Training outcome: 200 x 1 Testing features: Not available Testing outcome: Not available [2020-06-23 08:18:03 s.NLS] Initializing all parameters as 0.1 [2020-06-23 08:18:03 s.NLS] Training NLS model... [[ NLS Regression Training Summary ]] MSE = 0.68 (89.24%) RMSE = 0.82 (67.19%) MAE = 0.65 (54.06%) r = 0.94 (p = 7.9e-98) rho = 0.66 (p = 0.00) R sq = 0.89
[2020-06-23 08:18:03 s.NLS] Run completed in 3.2e-03 minutes (Real: 0.19; User: 0.11; System: 0.02)
5.2.4 Scatterplot + Cluster
We already saw we can use any learner to draw a fit line in a scatter plot. You can similarly use any clutering algorithm to cluster the data and color them by cluster membership. Let’s use HOPACH (Van der Laan and Pollard 2003) to cluster the famous iris dataset. Learn more about Clustering.
mplot3.heatmap’s colorbar defaults to 10 overlapping discs on either side of zero, representing a 10% change from one to the next.
Turn off hierarchical clustering and dendrogram:
Some synthetic data:
5.6 Decision Boundaries
The goal of a classifier is to establish a decision boundary in feature space separating the different outcome classes. While most feature spaces are high dimensional and cannot be directly visualized, it is can still be helpful to look at decision boundaries in low-dimensional problems. We can compare different algorithms or the effects of hyperparameter tuning for a given algorithm.
5.6.1 2D synthetic data
Let’s create some 2D synthetic data using the mlbench package, and plot them, coloring by group, using
5.6.2 Logistic Regression
5.7 Multiplots with mplot3
rtemis provides a convenience function to plot multiple graphs together,
rtlayout. It’s based on the
graphics::layout function and integrates behind the scenes with all
mplot3 functions. You specify number of rows and number of columns. Optional arguments allow you to arrange plots by row or by column and automatically create labels for each plot. As with most visualization functions in rtemis, there is an option to save to PDF. This means you can create a publication-quality multipanel plot in a few lines of code:
Start by defining n nrows and n columns, plot your plots using
mplot3 functions, and close using
Van der Laan, Mark J, and Katherine S Pollard. 2003. “A New Algorithm for Hybrid Hierarchical Clustering with Visualization and the Bootstrap.” Journal of Statistical Planning and Inference 117 (2): 275–303.