A survey of singly linked list data structure variants

“Despite this obvious simplicity, there are myriad implementation variations.”

— Bruno R. Preiss (Preiss 1998)

[WARNING DRAFT: this page is under construction and as such is a bit of a mess. comments welcome.]

This page is intended as a compact survey of singly linked list variants. This is not a linked list tutorial — for that please check the links at the end.

When I decided to write this page it seemed as though every data structure book I opened presented a slightly different variant of the singly linked list (e.g. circular linkage, null terminator, sentinel terminator) but I could not find a survey that clearly presented all the variants. That’s what I’m trying to do here. I would like to document as many linked list variants as I can find in as concise a form as possible. If you notice that I’ve missed a variant or a key point please let me know in the comments. Links to additional resources are welcome. If you know a data structures book (or survey paper) that does a better job of enumerating and analyzing all the variants please let me know — I’d like to read it.

Mostly I try to refer to data structures, not implementations. But you will notice that I am biased towards C and C++.

Also note that although I have focused on the data structures here — the choice of structure is largely dictated by the algorithms and behavioral requirements.

I have made a poster to try to summarize everything visually. Click here for the high-res pdf

Singly Linked Lists poster small image

Behavioral variants

There are two main behavioral variants. One provides access only via the front/head of the list. The other also allows insertion at the back:

  • Singly linked list provides accesses all elements via the front of the list. It can be used as a push-down (LIFO) stack but doesn’t provide constant-time access to the tail.
  • A tail-pointer variant provides all the functionality of  an singly linked list but also allows constant-time insertion at the back of the list (but not constant time deletion at the back). In BSD headers this is called SIMPLEQ. There is also something similar called a “tail queue” but I believe that this is usually doubly linked.

From what I can tell, the idea of a singly linked list and a “tail queue” have been around since the 50s. You can see in this survey of list processing languages from 1963 mentions these behavioral variants. Programming the Logic Theory Machine (Newell and Shaw, 1957) describes algorithms for construction of linked lists and contains an early appearance of the the diagram showing linked elements.

Structural variants

The structure of a singly linked list has at least the following variation points:

  • Intrusive (next pointers embedded in the objects) vs separate external nodes (aka endogenous vs exogenous)
  • Terminated vs circularly linked
  • Termination type: null pointer vs sentinel node, also null pointer vs self-referencing terminator
  • Presence of “dummy head” node
  • Representation of the “list object”: ptr to first or last node, or special structure
  • Next links may point to the address of the next node, or to the address of the next link in the next node.
  • A counter can be bundled with the list object to provide an O(1) query for the number of elements in the list

Some of these are explored briefly below but the Wikipedia entry for Linked List does a better job. In future if I can work out how to boil this down to a few charts and tables I will.

A variation-point analysis suggests that there are 8 topological variants, which expand to up to 22 implementation variants when you consider the different ways of setting up pointers to the data structures. So far I have only found 12 variants “in the wild”. These are documented in the diagrams below.

Topological variants:

Singly Linked Lists: topological variants

Known species:

Singly Linked Lists: known species
The diagrams above are also available in the pdf version.

Here’s the variation-point tree:

topology (22)
[
	linear (18)
	[
		dummy head (2)
		[
			yes
			no
		]
		dummy tail / terminating sentinel (9)
		[
			yes (6)
			[
				pointers (3)
				[
					1 ptr: points to first node (must point to dummy head if present)
					2 ptrs: first and last node (head/dummy-head and tail)
					2 ptrs: first and terminating sentinel
				]
				termination (2)
				[
					null next pointer (terminal->next = NULL)
					self-referencing terminator (terminal->next = &terminal)
				]
			]

			no (3)
			[
				pointers (3)
				[
					1 ptr: points to first node (must point to dummy head if present)
					2 ptrs: first and last node (head and tail)
					2 ptrs: { first, &last }
				]

				termination (1)
				[
					null next pointer (terminal->next = NULL)
					// having a self-referencing terminator doesn't make without a sentinel I think
				]
			]
		]
	]

	circular  (4)
	[
		dummy node (2)
		[
			no dummy node
			dummy head/tail
		]
		pointers (2)
		[
			1 ptr: points to first node (must point to dummy head if present)
			1 ptr: points to last element / tail
			***two-pointer versions are theoretically possible but i havn't seen them discussed anywhere
		]
	]
]

Termination, sentinels and circular lists

Linked lists can be terminated or circularly linked (last node points to first). This gives three options for the end of the list:

  • A null pointer or other “special” global value
  • A sentinel node (which may be unique to each list, or global). The sentinel may link to itself, or have a null next pointer.
  • A pointer back to head, or back to a dummy “pre-head” node

A tail-pointer queue can be implemented by maintaining the queue as a circular list with a single pointer to the tail. The first node is reachable via taile->next.

The Wikipeida entry for Linked List has an extensive discussion of techniques and benefits of sentinel nodes.

  • When using a self-linked sentinel (sentinel->next = sentinel), the pop_front() operation doesn’t require a conditional (but the caller still needs to check for a valid value).
  • When implementing a linear search, the check for the sentinel node can be avoided by storing the searched for value in the sentinel. [while( w1->next && w1->data < x) w1 = w1->next; becomes sentinel->data = x; while( w1->data < x ) w1->w1->next;] (Wirth 1976).

Representing the “list object”

A list object may be represented by:

  • A pointer to the first element
  • Two pointers: to the first element and the last element (for a tail pointer queue)
  • A pointer to the last element (in the circular case, for a tail pointer queue)
  • A special “pre-head” node, or some casting magic to make it look that way. This has the advantage of making insert_after() and push_front() look the same.

Operations

  • O(1) is_empty() query
  • O(1) access to front element.
  • O(1) push_front(), pop_front().
  • O(1) insert_after() and remove_after() (given a particular node pointer)
  • O(1) remove_at() possible in some cases (see Wirth’s “trick” below).
  • O(1) swap two lists
  • O(1) splice (such as moving all of one list to the end of another) assuming that you have node pointers for each part of the splice. For appending you need one tail pointer.
  • List with tail pointer provides O(1) access to back element.
  • List with tail pointer provides O(1) push_back().

As noted earlier, the most basic singly linked list can be used as an O(1) LIFO stack. An extension to allow insertion at the end provides a O(1) FIFO queue.

Naming conventions

“head” may point to the first element, or to a dummy head element (Preiss). Sedgewick’s head variable points to dummy head. K&R C 2nd ed. refers to the first element of a linked list as “head” (p.118)

OpenBSD SLIST has a “first” pointer that points to the first element.

“tail” points to the last element. Stroustrup calls this “last”. Sedgewick uses z to point to the dummy/sentinal end terminator node.

Miscellaneous factoids

  • Implementation of each data type may involve different data structures and/or different book keeping overhead. Some data structures have few edge-cases and therefore very simple implementations with few conditionals, others require more complex algorithms to deal with edge cases but may provide more functionality, or a simpler data structure.
  • You can store a singly-linked “tail queue” by maintaining a single pointer to the last element and keeping the list circularly linked.
  • Usually if you have a pointer to an slist node you can only unlink the following node. However, if you’re using external links you can switch the following data into the current node, then unlink the following node [this “trick” is from (Wirth 1976 p.176):  next = cur->next; cur->data = next->data; cur->next = next->next; delete next;
  • When using linear (non-circular) linked lists, a pointer to any element also looks just like a list. Similarly multiple nodes can link into a single node, creating a tree structure where current->next expresses the child->parent relation. This is apparently called tail-sharing. In both cases traversal to the terminal or root node proceeds as when traversing a linear linked list.
  • When using intrusive lists, if you have a pointer to object, you have a pointer to the link. Obvious, but also useful.
  • If the list is terminated with a sentinel that links to itself, it is possible to implement pop_front() without checking for an empty list. In many (but maybe not all) cases this isn’t very useful because the caller will still need to then check whether the sentinel is a valid node.
  • Structures involving dummy nodes are easier to justify when the node data is small, and/or it’s possible to embed the dummy node(s) in the list storage structure.
  • Use of dummy nodes is generally justified as a way to simplify algorithms.
  • When searching you can use “Move to front” heuristic to keep commonly used items near the front of the list. (Wirth 1976) xlinux.nist.gov/dads/HTML/movefront.html
  • Performing a binary search in a sorted linked list reduces the number of comparisons xlinux.nist.gov/dads/HTML/linkedList.html
  • As shown in many of the references, linked lists can and often are used to implement LIFO stacks and FIFO queues.
  • C++ STL containers use external link nodes. Boost intrusive containers use embedded links. Patrick Wyatt made a good case for using intrusive linked lists here, using Starcraft as a case study. I suspect intrusive linked data structures were the norm before OO took hold.

Implementations

References (online)

References (books and publications, some linked online)

TODO/Notes to self

Check which variants are implemented by SGI STL slist, Boost slist, stl::forward_list etc.

Does it make sense to have a self referencing tail if it isn’t a dummy node? (I think that without a dummy node, it would be exactly the same algo for pop_front() (except check for last node/ would look different) but push_front would be more complicated (because it would have to special-case an empty list, whereas that isn’t necessary with a null ptr.
could write a graph generator in python and then visualise the results using graphvizcan i make an implementation generator?

Do a more in depth literature search (citeseer, ACM, Scholar etc) Look for primary references.

If you were considering making your own implementations it seems to me, for most uses, 4 variants are needed and would be sufficient:

  1. Single pointer slist (no tail operations)
  2. Tail pointer slist (tail push, splice back)

Plus versions of the above with a size counter.

Unsorted/unread links

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

19 + seven =