Understanding the difference between the d3 data and datum methods

Posted on Tags ,

One of the neatest and simultaneously confusing aspects of D3 is its data binding methods selection.data and selection.datum that bind data to elements in the DOM. These methods while seemingly simple, enable effortless generation of very complex data visualizations by virtue of keeping visualization elements in the DOM closely coupled to the data being visualized. However, if you browse through the d3 API, it can be a bit difficult to discern the difference between the selection.data and selection.datum methods and when one should prefer one over the other. This post aims to clarify the subtle and not-so-subtle differences between these methods. I assume that you already have a decent idea of how data binding works in D3, including concepts such as the enter(), update() and exit() selections. If you need a quick refresher or are unfamiliar with these concepts, then I would highly recommend the following three resources:

So let’s first begin with a good definition for data v/s datum.

Datum – an item of factual information derived from measurement or research (singular of data)

Compare this against the definition for Data

Data – a collection of facts from which conclusions may be drawn (plural of datum)

The basic gist of it is this – datum refers to a single unit of data whereas data refers to a collection of facts or datum.

So let’s now delve into each of these methods with this understanding of the meaning of data and datum. First, we should note that both selection.data and selection.datum provide different functionalities based on whether any data is passed in as an input argument to these methods or not. In the case when no data is passed in, these methods act as “getter” methods to access the underlying data/datum bound to elements in the selection. Due to this difference, we will treat each case separately below.

Case 1: When data is supplied as an input argument

selection.data(data)

  • selection.data(data) will attempt to perform the usual D3 data-join that we are all familiar with. This data-join occurs between elements in the data array and element(s) in the selection.  Data elements that match with existing elements in the selection are part of the default update() selection. Selection elements with no matching data elements are placed in the exit() selection, whereas data elements with no matching DOM elements result in the creation of matching virtual selections that are accessible as part of the enter() selection. The end result of this is if you pass in an array data = [{x: 1}, {x: 2}, {x: 3}], an attempt is made to join each individual data element or datum (for example –{x: 1}) with the selection. Each element of the selection (virtual or real) will only have a single datum element of data bound to it.
  • If data only contains a single data element or datum (eg: data = [{x: 1}]) while selection contains many elements, then only the first matching selection element has the datum {x: 1} bound to it with all the other selection elements being placed in the exit() selection.

selection.datum(data)

  • selection.datum(data) bypasses the data-join process altogether. This command is essentially stating that you want to set the datum for every selection element to be data. If you look back at the definition of datum – a singular element of data, this makes sense. We are essentially setting the singular element of data for each selection element using this method. As a result, if you pass in an array data = [{x: 1}, {x: 2}, {x: 3}] to selection.datum(data), each selection element in selection will have the same array bound to it. So each selection element’s bound data in __data__ will be [{x: 1}, {x: 2}, {x: 3}].

It is important to note the key differences between the two methods. selection.datum() will bind the provided data as a unit to every element in selection. Meanwhile, selection.data() will data-join individual elements within the data array to the selection.

Warning: You may come across descriptions where people mention that selection.datum(data) is identical to selection.data([data]). However, this is only true if selection contains a single element. In that case, [data] produces a single element array, so the data-join using selection.data is equivalent to assigning the datum data to the same selection using selection.datum(data). However, if selection has multiple elements, selection.datum(data) will assign data to each of those selection elements whereas selection.data([data]) would only data-join [data] with the first selection element.

Case 2: When no data is supplied as an input argument

selection.data()

  • selection.data() will take the bound datum for each element in the selection and combine them into an array that is returned. So, if your selection includes 3 DOM elements with the data {x: 1}, {x: 2} and {x: 3} bound to each respectively, selection.data() returns [{x: 1}, {x: 2}, {x: 3}]. Note that if selection is a single element with (by way of example) the datum "a" bound to it, then selection.data() will return ["a"] and not "a" as some may expect.
  • selection.datum() only makes sense for a single selection as it is defined as returning the datum bound to the first element of the selection. So in the example above with the selection consisting of DOM elements with bound datum of {x: 1}, {x: 2} and {x: 3}, selection.datum() would simply return {x: 1}.
Note that even if selection has a single element, selection.datum() and selection.data() return different values. The former returns the bound datum for the selection ("a" in the example above) whereas the latter returns the bound datum within an array (["a"] in the example above).

 

Hopefully this helps clarify how selection.data and selection.datum() differ from each other; both when providing data as an input argument and when querying for the bound datum by not providing any input arguments. Feel free to leave a comment below if you have anything you’d like to add to this discussion.

PS – In case this post seems a bit familiar to you, I should mention that this is a more detailed version of a response I posted on Stack Overflow a little while back.