Abstract Data Types: Associative Arrays

├─ What is an ASSOCIATIVE ARRAY
├─ What are the operations an ASSOCIATIVE ARRAY requires
└─ What are the data structures that implement ASSOCIATIVE ARRAY

What is an associative array?

An associative-array (or map) is a sequential abstract data type in which entries are key-value pairs (instead of being just values) where following rules have to be obeyed:

any entry value is accessible by its entry key
a single value can exist at multiple keys
a single key can only hold a single value

How values are added by key to associative array (map):

What are the operations an associative array requires?

Although, as an abstract data type, associative array comes with no implementation, it requires from data structures that implement it to provide support for a minimal set of operations:

Operation	Arguments	Returns	Description
clear		void	Clears associative array of all values.
containsKey	KEY	bool	Checks if key exists in associative array.
containsValue	VALUE, COMPARATOR	bool	Checks if value exists in associative array. This potentially requires iterating the whole map and applying comparator to see if value matches. AVOID unless map is small!
get	KEY	VALUE	Gets value by key.
set	KEY, VALUE	void	Sets value by key.
removeKey	KEY	void	Removes element by key.
removeValue	VALUE, COMPARATOR	void	Removes all elements that match value. This potentially requires iterating the whole list, applying comparator to see if value matches and, every time it matches, deleting entry from map. AVOID unless map is very small!
isEmpty		void	Checks if associative array is empty
size		ULONG	Gets number of entries in associative array

Where:

VALUE: value of an associative array entry
KEY: key at which value is found in associative array
COMPARATOR: function with signature compare(source, destination): BOOLEAN able to compare values in associative array

What are the data structures that implement associative array?

Most common implementations of associative array are:

Hash Table
- Strengths:
  - By far the fastest in insertion/lookup/deletion of entries: because hash table only needs to compute a hash of entry key and look up in bucket (List at position @ List) that corresponds to hash to add/delete/retrieve that entry. Because bucket ideally contains one entry only, this operation is considered to occur in constant O(1) time.
  - By far the smallest in memory consumption: only a List of List.
- Weaknesses:
  - Unpredictable iteration, which makes it hard to test.
  - It depends on:
    - a fast hash function that guarantees even distribution throughout buckets, so that the List inside ideally contains only one entry.
    - a reliable comparator function that guarantees there won't be duplicates with same value of key in the map (which would violate above-mentioned map principles)
Linked Hash Table
- Strengths:
  - Almost as fast, just a bit slower (5-10%) than Hash Table on insertion/lookup/deletion of entries: because it needs to maintain the doubly linked list that preserves insertion order as well.
  - Predictable iteration, which makes testing easy
- Weaknesses:
  - Greater memory consumption (33%) than Hash Table, because it needs to store the doubly linked list that maintains insertion order.
  - Same dependency on a good hash/comparator function.
Red Black Tree
- Strengths:
  - Stays ordered by key.
  - Predictable iteration, which makes testing easy
- Weaknesses:
  - Much slower (6-7 times) than Hash Table in insertion/deletion of entries: because the red black tree always has to rebalance itself, which is a VERY costly operation that may involve the whole data set.
  - Greater memory consumption (33%) than Hash Table, because the struct that wraps each value requires more information.
  - Depends on a reliable key comparator function.

This time complexity table using Big O Notation shows advantages and disadvantages of each data structure chosen for each associative array operation:

Operation	Hash Table	Linked Hash Table	Red Black Tree
clear	O(N)	O(N)	O(N)
containsKey	O(1)	O(1)	O(log(N))
containsValue	O(N)	O(N)	O(N)
get	O(1)	O(1)	O(log(N))
set	O(1)	O(1)	O(log(N))
removeKey	O(1)	O(1)	O(log(N))
removeValue	O(N)	O(N)	O(N)
isEmpty	O(1)	O(1)	O(1)
size	O(1)	O(1)	O(1)

Conclusion: Hash Table, with its speed/memory efficiency advantages, should be solution of choice unless one specifically desires structure to be predictably iterated or self-ordered!