Tutorial 1: Data types and default commands

Circleplot is designed to plot data on the the distance or degree of association between pairs of units. You can provide this data in a range of formats. For example, the default output from Species Pairwise Association Analysis (sppairs::spaa) is as a data.frame with three columns, and circleplot accepts this by default. Alternatively, users can specify associations using either lower triangular or square matrices*. Finally, you can supply a list containing >1 of the above (but more on that later).

When you provide data to circleplot, the code will attempt to ascertain two attributes of the dataset: whether it is symmetrical (vs asymmetrical)** and whether it is binary (vs numeric)***. The primary difference between these options is in how the plot is drawn. For binary matrices the colour of the lines doesn’t matter, whereas colours are indicative of the degree or strength of association between two points. Similarly, asymmetric matrices have directions that are designated by arrows, whereas arrows are superfluous on a symmetric matrix (see below).

circleplot_eg_10July2015

You can recreate the above plot with the following code:

# get circleplot
library(devtools)
install_github('mjwestgate/circleplot')
library(circleplot)

# prepare binary data
# asymmetric
binary.matrix<-matrix(
  data=cut(rnorm(11**2), breaks=c(-20, 0, 20), labels=FALSE)-1,
  nrow=11, ncol=11)
colnames(binary.matrix)<-LETTERS[1:11]
rownames(binary.matrix)<-LETTERS[1:11]
# symmetric
binary.dist<-as.dist(binary.matrix)
# point attributes
point.attributes<-point.attr(binary.dist)
	point.attributes$cex<-0.7

# prepare numeric data
input.data<-rnorm(11**2)*0.5
input.data[c(2, 5, 12, 16, 17, 21, 25, 40)]<-NA
numeric.matrix<-matrix(data= input.data, nrow=11, ncol=11)
colnames(numeric.matrix)<-LETTERS[1:11]
rownames(numeric.matrix)<-LETTERS[1:11]
numeric.dist<-as.dist(numeric.matrix)

# draw
par(mfrow=c(2, 2), oma=c(0, 1.5, 1.5, 0))

# panel 1: binary symmetric
circleplot(binary.dist, cluster=FALSE)
mtext("Binary", side=2, line=0.8)
mtext("Symmetric", side=3, line=0.8)

# panel 2: binary asymmetric
circleplot(binary.matrix, cluster=FALSE, plot.control=list(
	arrows=list(angle=15, distance=0.75, length=0.07),
	points=point.attributes,
	line.gradient=TRUE))
mtext("Asymmetric", side=3, line=0.8)

# panel 3: numeric symmetric
circleplot(numeric.dist-min(numeric.dist, na.rm=TRUE),
	plot.control=list(border=list(lwd=2)))
mtext("Numeric", side=2, line=0.8)

# panel 4: numeric asymmetric
circleplot(numeric.matrix, plot.control=list(arrows=list(
	angle=15, distance=0.75, length=0.07)))
par(mfrow=c(1, 1))
</pre>

Notes:

* Users may be aware that class list and class matrix are fundamentally different in R, and may find it unusual that they are discussed together here. However, distances are a kind of information as well as a class in the R language. In particular, class 'dist' is primarily suited to handle symmetric distances, which is not the only kind of data that users might be interested in. Therefore, circleplot can take either traditional distance matrices (class 'dist') or traditional matrices (class 'matrix') as an input (though the latter must be square to be accepted, i.e. nrow(x)==ncol(x)).

** To illustrate why this is important, consider relationships between people on social media sites. On some sites (e.g. Facebook), connections between individuals are reciprocal; if you are friends with someone else, then they are friends with you as well. This is a symmetrical relationship, which we can represent with a lower-triangular distance matrix in which rows/columns are people, ones represent connections between friends, and zeroes represent pairs of people who are not friends. We could store this information as a square matrix; but that would duplicate information. In contrast, other sites (e.g. twitter) are not reciprocal (i.e. they are asymmetrical); each user can 'follow' other users, but this does not require that those users reciprocate (i.e. person A can follow person B without person B following person A). In this case, data are stored as a square matrix; values below the diagonal show users (rows) who follow other users (columns), while values above the diagonal show whether these connections are reciprocated (i.e. column to row).

*** Numeric matrices have very different properties from binary matrices, and with good reason: you cannot 50% follow someone on twitter, and so we can represent associations between individuals using only zeroes and ones. In contrast, distances between locations are symmetrical, numeric values: London is closer to Paris than Sydney; these distances can be measured in a numeric unit (kilometres or miles); and the order of each point doesn't affect the result (distance from London to Sydney is the same as Sydney to London). Circleplot draws these relationships using colour and/or line gradation to show the magnitude of each connection. However, there are situations where the last of these rules does not hold, leading to asymmetric numeric values. When given an asymmetric numeric matrix, circleplot draws only the strongest of the pair of lines between each node, and adds arrows to show the direction of the effect.

Next: Setting point attributes

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s