Grand Tour (data visualisation)
The grand tour is a technique originally developed by Daniel Asimov 1980–85, which is used to explore multivariate statistical data by means of an animation. The animation, or "movie", consists of a series of distinct views of the data as seen from different directions, displayed on a computer screen, that appear to change continuously and that get closer and closer to all possible views. This allows a human- or computer-based evaluation of these views, with the goal of detecting patterns that will convey useful information about the data.
This technique is like what many museum visitors do when they encounter a complicated abstract sculpture: They walk around it to view it from all directions, in order to understand it better. The human visual system perceives visual information as a pattern on the retina, which is 2-dimensional. Thus walking around the sculpture to understand it better creates a temporal sequence of 2-dimensional images in the brain.
The multivariate data that is the original input for any grand tour visualization is a (finite) set of points in some high-dimensional Euclidean space. This kind of set arises naturally when data is collected. Suppose that for some population of 1000 people, each person is asked to provide their age, height, weight, and number of nose hairs. Thus to each member of the population there is associated an ordered quadruple of numbers. Since n-dimensional Euclidean space is defined as all ordered n-tuples of numbers, this means that the data on 1000 people correspond to 1000 points in 4-dimensional Euclidean space.
The grand tour converts the spatial complexity of the multivariate data set into temporal complexity by using the relatively simple 2-dimensional views of the projected data as the individual frames of the movie. (These are sometimes called "data views".) The projections will ordinarily be chosen so as not to change too fast, which means that the movie of the data will appear continuous to a human observer.
A grand tour "method" is an algorithm for assigning a sequence of projections onto (usually) 2-dimensional planes to any given dimension of Euclidean space. This allows any particular multivariate data set to be projected onto that sequence of 2-dimensional planes and thereby displayed on a computer screen one after the other, so that the effect is to create a movie of the data.
(Note that, once the data has been projected onto a given 2-plane, then in order to display it on a computer screen, it is necessary to choose the directions in that 2-plane that will correspond to the horizontal and vertical directions on the computer screen. This is typically a minor detail. But the choice of horizontal and vertical directions should ideally be done so as to minimize any unnecessary apparent "spinning" of the 2-dimensional data view.)
Technical description
[edit]Each "view" (i.e., frame) of the animation is an orthogonal projection of the data set onto a 2-dimensional subspace (of the Euclidean space Rp where the data resides). The subspaces are selected by taking small steps along a continuous curve, parametrized by time, in the space of all 2-dimensional subspaces of Rp (known as the Grassmannian G(2,p)). To display these views on a computer screen, it is necessary to pick one particular rotated position of each view (in the plane of the computer screen) for display. This causes the positions of the data points on the computer screen to appear to vary continuously. Asimov showed that these subspaces can be selected so as to make the set of them (up to time t) increasingly close to all points in G(2,p), so that if the grand tour movie were allowed to run indefinitely, the set of displayed subspaces would correspond to a dense subset of G(2,p).[1][2]
Software
[edit]- The tourr R package implements geodesic interpolation and basis generation functions that allow you to create new tour methods from R.
- The datatour Python package allows you to see your data in its native dimension.
References
[edit]- ^ Asimov, Daniel. (1985). The grand tour: a tool for viewing multidimensional data. SIAM journal on scientific and statistical computing, 6(1), 128–143.
- ^ Huh, Moon Yul, and Kiyeol Kim. (2002) Visualization of multidimensional data using modifications of the Grand Tour. Journal of Applied Statistics 29.5: 721–728.