Abstract data types

What is it?

A mathematical model of a data type
Specifies:
1. The operations supported
2. The type of data stored (but not how it’s stored)
3. Argument types and return types of these operations (but not how they are implemented)

Examples:

Array, List, Map, Queue, Set, Stack, Table, Tree, and Vector are ADTs.

Reading material

Representation Independence

Critically, a good abstract data type should be representation independent. This means that the use of an abstract type is independent of its representation (the actual data structure or data fields used to implement it), so that changes in representation have no effect on code outside the abstract type itself. For example, the operations offered by List are independent of whether the list is represented as a linked list or as an array.

You won’t be able to change the representation of an ADT at all unless its operations are fully specified with preconditions and postconditions, so that clients know what to depend on, and you know what you can safely change.

High level overview

Abstract Data Type(ADT) is a data type, where only behavior is defined but not implementation.

Opposite of ADT is Concrete Data Type (CDT), where it contains an implementation of ADT.

Examples:

Array, List, Map, Queue, Set, Stack, Table, Tree, and Vector are ADTs. Each of these ADTs has many implementations i.e. CDT. The container is a high-level ADT of above all ADTs.

Abstract Data Types Overview

Abstract Data Type	Other Common Names	Commonly Implemented with data structure
List	Sequence	Array List, Linked List
Queue		Array, Linked List
Set		Tree Set, Hash Set, Linked Hash Set, Red-black Tree, Hash Table
Map		Tree Map, Hash Map
Double-ended Queue	Dequeue, Deque	Array, Doubly-linked List
Stack		Array, Linked List
Associative Array	Dictionary, Hash Map, Hash, Map**	Hash Table
Priority Queue	Heap	Heap

(Implementations for almost all of these* are provided by the Java Collections Framework in the java.util package.)

What are ADTs?

To answer this question, we’ll take a look at something we already know about: an integer data type. Virtually every modern programming language has some representation for an integer type. In Java, we’ll look at the primitive type int. An initialized Java int variable holds a 32-bit signed integer value between (-2 power 32) and (2 power 32) -1. So, we’ve established that an int holds data.

Operations can be performed on an int. We can assign a value to an int. We can apply the Java unary prefix and postfix increment and decrement operators (i++, ++i, i–, –i) to an int. We can use an int in binary operation expressions, such as addition, subtraction, multiplication, and division. We can test the value of an int, and we can use an int in an equality expression.

In performing these operations on an int variable, the user does not need to be concerned with the implementation of the operation. The internal mechanism by which these operations work is irrelevant. Examine the following simple code fragment, for example:

int i = 0;
i++;

The user knows that after the second statement is executed, the value of the i variable is 1. It isn’t important to know how the value became 1—just that in performing the increment in this example, i always will equal 1.

The user also does not need to know how the value is represented and stored internally. Things such as byte order again are irrelevant to the user in the preceding code example.

To summarize the built-in int data type, an int does the following:

An int holds an item of data.
Predefined operations can be performed on an int.
An int has an internal representation that is hidden from the user in performing these operations.

If we consider the primitive data types in this light, it is easy to understand the definition we will give to ADTs. An ADT is defined as the following:

An ADT is an externally defined data type that holds some kind of data. 1
An ADT has built-in operations that can be performed on it or by it.
Users of an ADT do not need to have any detailed information about the internal representation of the data storage or implementation of the operations.

So, in effect, an ADT is a data construct that encapsulates the same kinds of functionality that a built-in type would encapsulate. This does not necessarily imply that ADTs need to have addition or increment operations in order to be valid or useful, and it does not mean that any of the built-in operators will work with an ADT. It only means that the appropriate operations for the type created will be transparently available and that the user does not need to be concerned with the implementation details.

Why Use Abstract Data Types?

The String class provides a mechanism by which string literals may be stored, accessed, and manipulated. It provides methods with which we can compare, concatenate, copy, and convert strings. If a String class did not exist, string operations would have to be implemented from scratch each time they were needed.

A robust and reasonably generic String class gives us the capability to use these string operations at any time without having to reinvent the wheel each time. So ADTs provide us with code reusability. After we encapsulate the operations required to make a useful String class, we can reuse those facilities at any time in the future, with little or no additional development effort.

This also is the case with other ADTs. All ADTs in Java are implemented as classes. By designing our ADTs to be as generic as possible, we can reuse them in various situations and over several projects. Any time we develop an object or a group of related objects that can be reused, we reduce the overall development time of a project.

There are certain guidelines that need to be followed to make ADT’s reusable. In this book, we are primarily concerned with container ADTs. A container object’s primary purpose is to hold other objects. The containers will hold various types of data.

To make our containers reusable, we need to make them generic. Generic, in this sense, means that the containers need to follow three rules:

Containers need to be able to store data of any kind.
Containers should provide a public interface that encompasses only behaviors that would be useful in a general sense.
Containers should be kept insulated from application-specific considerations.

APIs for Abstract Data Types

Several fundamental data types involve collections of objects. Specifically, the set of values is a collection of objects, and the operations revolve around adding, removing, or examining objects in the collection. In this section, we consider three such data types, known as the bag, the queue, and the stack. They differ in the specification of which object is to be removed or examined next.

Bags, queues, and stacks are fundamental and broadly useful. We use them in implementations throughout the book. Beyond this direct applicability, the client and imple- mentation code in this section serves as an introduction to our general approach to the development of data structures and algorithms.

One goal of this section is to emphasize the idea that the way in which we represent the objects in the collection directly impacts the efficiency of the various operations. For collections, we design data structures for representing the collection of objects that can support efficient implementation of the requisite operations.

A second goal of this section is to introduce generics and iteration, basic Java constructs that substantially simplify client code. These are advanced programming-language mechanisms that are not necessarily essential to the understanding of algorithms, but their use allows us to develop client code (and implementations of algorithms) that is more clear, compact, and elegant than would otherwise be possible.

A third goal of this section is to introduce and show the importance of linked data structures. In particular, a classic data structure known as the linked list enables implementation of bags, queues, and stacks that achieve efficiencies not otherwise possible. Understanding linked lists is a key first step to the study of algorithms and data structures.

For each of the three types, we consider APIs and sample client programs, then look at possible representations of the data type values and implementations of the data-type operations. This scenario repeats (with more complicated data structures) throughout this book. The implementations here are models of implementations later in the book and worthy of careful study.

As usual, we begin our discussion of abstract data types for collections by defining their APIs, shown below. Each contains a no-argument constructor, a method to add an item to the collection, a method to test whether the collection is empty, and a method that returns the size of the collection. Stack and Queue each have a method to remove a particular item from the collection. Beyond these basics, these APIs reflect two Java features that we will describe on the next few pages: generics and iterable collections.

Bag

public class Bag<Item> implements Iterable<Item>
Bag() - create an empty bag
void add(Item item) - add an item
boolean isEmpty() - is the bag empty?
int size() - number of items in the bag

A bag is a collection where removing items is not supported—its purpose is to provide clients with the ability to collect items and then to iterate through the collected items (the client can also test if a bag is empty and find its number of items). The order of iteration is unspecified and should be immaterial to the client. To appreciate the concept, consider the idea of an avid marble collector, who might put marbles in a bag, one at a time, and periodically process all the marbles to look for one having some particular characteristic. With our Bag API, a client can add items to a bag and process them all with a foreach statement whenever needed. Such a client could use a stack or a queue, but one way to emphasize that the order in which items are processed is immaterial is to use a Bag . The class Stats at right illustrates a typical Bag client (TODO - implement it). The task is simply to compute the average add () and the sample standard deviation of the double values on standard input. If there are N numbers on standard input, their average is computed by adding the numbers and dividing by N; their sample standard deviation is computed by adding the squares of the difference between each number and the average, dividing by N–1, and taking the add () square root. The order in which the numbers are considered is not relevant for either of these calculations, so we save them in a Bag and use the foreach construct to compute each sum. Note : It is possible to compute the standard deviation without saving all the numbers (as we did for the average in Accumulator —see Exercise 1.2.18) (TODO - implement it). Keeping the all numbers in a Bag is required for more complicated statistics.

FIFO queue

public class Queue<Item> implements Iterable<Item>
Queue() - create an empty queue
void enqueue(Item item) - add an item
Item dequeue() - remove the least recently added item
boolean isEmpty() - is the queue empty?
int size() - number of items in the queue

A FIFO queue (or just a queue) is a collection that is based on the first-in-first-out (FIFO) policy. The policy of doing tasks in the same order that they arrive is one that we encounter frequently in everyday life: from people waiting in line at a theater, to cars waiting in line at a toll booth, to tasks waiting to be serviced by an application on your computer. One bed-rock principle of any service policy is the perception of fairness. The first idea that comes to mind when most people think about fairness is that whoever has been waiting the longest should be served first. That is precisely the FIFO discipline. Queues are a natural model for many everyday phenomena, and they play a central role in numerous applications. When a client iterates through the items in a queue with the foreach construct, the items are processed in the order they were added to the queue. A typical reason to use a queue in an application is to save items in a collection while at the same time preserving their relative order : they come out in the same order in which they were put in. For example, the client below (TODO : implement) is a possible implementation of the readDoubles() static method from our In class. The problem that this method solves for the client is that the client can get numbers from a file into an array without knowing the file size ahead of time. We enqueue the numbers from the file, use the size() method from Queue to find the size needed for the array, create the array, and then dequeue the numbers to move them to the array.

Some uses:

Scheduling access to shared resource (e.g., printer)

Operations Provided	Efficiency
Enqueue item	O(1)
Dequeue item	O(1)

Implemented by LinkedList and ArrayDeque in Java

Pushdown (LIFO) stacks

public class Stack<Item> implements Iterable<Item>
Stack() - create an empty stack
void push(Item item) - add an item
Item pop() - remove the most recently added item
boolean isEmpty() - is the stack empty?
int size() - number of items in the stack

Operations Provided	Efficiency
Push item	O(1)
Pop item	O(1)

Implemented by Stack, LinkedList, and ArrayDeque in Java

The only differences between the APIs for Stack and Queue are their names and the names of the methods. This observation highlights the idea that we cannot easily specify all of the characteristics of a data type in a list of method signatures. In this case, the true specification has to do with the English-language descriptions that specify the rules by which an item is chosen to be removed (or to be processed next in the foreach statement). Differences in these rules are profound, part of the API, and certainly of critical importance in developing client code.

A queue is appropriate because it puts the numbers into the array in the order in which they appear in the file (we might use a Bag if that order is immaterial). This code uses autoboxing and auto-unboxing to convert between the client’s double primitive type and and the queue’s Double wrapper type.

A pushdown stack (or just a stack) is a collection that is based on the last-in-first-out (LIFO) policy. When you keep your mail in a pile on your desk, you are using a stack. You pile pieces of new mail on the top when they arrive and take each piece of mail from the top when you are ready to read it. People do not process as many papers as they did in the past, but the same organizing principle underlies several of the ap- plications that you use regularly on your computer. For example, many people organize their email as a stack they push messages on the top when they are received and pop them from the top when they read them, with most recently received first (last in, first out). The ad- vantage of this strategy is that we see interesting email as soon as possible; the disadvantage is that some old email might never get read if we never empty the stack. You have likely encountered another common example of a stack when surfing the web. When you click a hyperlink, your browser displays the new page (and pushes onto a stack). You can keep clicking on hyperlinks to visit new pages, but you can always revisit the previous page by clicking the back button (popping it from the stack). The LIFO policy offered by a stack provides just the be- havior that you expect. When a client iterates through the items in a stack with the foreach construct, the items are processed in the reverse of the order in which they were added. A typical reason to use a stack iterator in an application is to save items in a collection while at the same time reversing their relative order . For example, the client Reverse at right (TODO : implement) reverses the or- der of the integers on standard input, again without having to know ahead of time how many there are. The importance of stacks in computing is fundamental and profound, as indicated in the detailed example that we consider next.

See FullyParenthesizedArithmeticExpressionEvaluation.java

Trees

Collection of nodes
1. One specialized node is the root.
2. A node has one parent (unless it is the root)
3. A node has zero or more children.
Real-world “trees”:
1. Organizational hierarchies
2. Some family trees
Some uses:
1. Directory structure on a hard drive
2. Sorted collections

Operations Provided	Efficiency (only if the tree is balanced)
Find	O(log n)
Add/remove	O(log n)

Graphs

A collection of nodes and edges
1. Each edge joins two nodes
2. Edges can be directed or undirected
Real-world “graph”:
1. Road map
Some uses:
1. Tracking links between web pages
2. Facebook

Operations Provided	Efficiency
Find	O(n)
Add/remove	O(1) or O(n) or O(n2) Depends on implementation (time/space trade off)

Networks

Graph whose edges have numeric labels
Examples (labels):
1. Road map (mileage)
2. Airline’s flight map (flying time)
3. Plumbing system (gallons per minute)
4. Computer network (bits/second)
Famous problems:
1. Shortest path
2. Maximum flow
3. Minimal spanning tree
4. Traveling salesman
5. Four-coloring problem for planar graphs

Heap

https://en.wikipedia.org/wiki/Heap_(data_structure)

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the “top” of the heap (with no parents) is called the root node.

The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact, priority queues are often referred to as “heaps”, regardless of how they may be implemented. In a heap, the highest (or lowest) priority element is always stored at the root. However, a heap is not a sorted structure; it can be regarded as being partially ordered. A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

           22
         /    \
        19    18
       / \   / \
      15  3 14  4
 /
12

Array         : [22, 19, 18, 15, 3, 14, 4, 12]
Array position: [0,   1,  2,  3, 4, 5,  6,  7]

For the node at array[i] -
left child  = 2i + 1
right child = 2i + 2

parent = floor((i - 1)/2)

Operations on a Heap

The common operations involving heaps are:

Basic

find-max (or find-min): find a maximum item of a max-heap, or a minimum item of a min-heap, respectively (a.k.a. peek)
insert: adding a new key to the heap (a.k.a., push[4])
extract-max (or extract-min): returns the node of maximum value from a max heap [or minimum value from a min heap] after removing it from the heap (a.k.a., pop)
delete-max (or delete-min): removing the root node of a max heap (or min heap), respectively
replace: pop root and push a new key. More efficient than pop followed by push, since only need to balance once, not twice, and appropriate for fixed-size heaps.

Creation

create-heap: create an empty heap
heapify: create a heap out of given array of elements
merge (union): joining two heaps to form a valid new heap containing all the elements of both, preserving the original heaps.
meld: joining two heaps to form a valid new heap containing all the elements of both, destroying the original heaps.

Inspection

size: return the number of items in the heap.
is-empty: return true if the heap is empty, false otherwise.

Internal

increase-key or decrease-key: updating a key within a max- or min-heap, respectively
delete: delete an arbitrary node (followed by moving last node and sifting to maintain heap)
sift-up: move a node up in the tree, as long as needed; used to restore heap condition after insertion. Called “sift” because node moves up the tree until it reaches the correct level, as in a sieve.
sift-down: move a node down in the tree, similar to sift-up; used to restore heap condition after deletion or replacement.

Implementing collections

To address the issue of implementing Bag, Stack and Queue, we begin with a simple classic implementation, then address improvements that lead us to implementations of the APIs articulated above.

See FixedCapacityStackOfStrings.java

Generics:

See FixedCapacityStack.java

Array resizing:

See ArrayResizing.java

Iteration:

See ReverseArrayIterator.java

The implementations of bags, queues, and stacks that support generics and iteration that we considered in this package provide a level of abstraction that allows us to write compact client programs that manipulate collections of objects. Detailed understanding of these ADTs is important as an introduction to the study of algorithms and data structures for three reasons.

First, we use these data types as building blocks in higher-level data structures throughout this book.
Second, they illustrate the interplay between data structures and algorithms and the challenge of simultaneously achieving natural performance goals that may conflict.
Third, the focus of several of our implementations is on ADTs that support more powerful operations on collections of objects, and we use the implementations here as starting points.

We now have two ways to represent collections of objects, arrays and linked lists. Arrays are built in to Java; linked lists are easy to build with standard Java records. These two alternatives, often referred to as sequential allocation and linked allocation, are fundamental. Later in the book, we develop ADT implementations that combine and extend these basic structures in numerous ways. One important extension is to data structures with multiple links. For example, our focus in Sections 3.2 and 3.3 is on data structures known as binary trees that are built from nodes that each have two links. Another important extension is to compose data structures: we can have a bag of stacks, a queue of arrays, and so forth. For example, our focus in Chapter 4 is on graphs, which we represent as arrays of bags. It is very easy to define data structures of arbitrary complexity in this way: one important reason for our focus on abstract data types is an attempt to control such complexity.

Our treatment of BAGS, queues, and STACKS in this section is a prototypical example of the approach that we use throughout this book to describe data structures and algorithms. In approaching a new applications domain, we identify computational challenges and use data abstraction to address them, proceeding as follows:

Specify an API.
Develop client code with reference to specific applications.
Describe a data structure (representation of the set of values) that can serve as
the basis for the instance variables in a class that will implement an ADT that
meets the specification in the API.
Describe algorithms (approaches to implementing the set of operations) that
can serve as the basis for implementing the instance methods in the class.
Analyze the performance characteristics of the algorithms.

Often, one particular ADT and implementation is best for the problem

Which ADT to use?
1. It depends. How do you access your data? By position? By key? Do you need to iterate through it? Do you need the min/max?
Which implementation to use?
1. It also depends. How important is fast access vs fast add/remove? Does the data need to be ordered in any way? How much space do you have?
But real life is often messier…

Abstract data types

What is it?

Examples:

Reading material

Representation Independence

High level overview

What are ADTs?

Why Use Abstract Data Types?

APIs for Abstract Data Types

Bag

FIFO queue

Pushdown (LIFO) stacks

Trees

Graphs

Networks

Heap

Operations on a Heap

Basic

Creation

Inspection

Internal

Implementing collections

Often, one particular ADT and implementation is best for the problem

Tags

Links to this note