In the SubsetSum problem, we are given a list $S = \{x_1, \dots, x_n\}$ of $n$ numbers and a target value $t$. The question is whether there is a subset $A \subseteq S$ with
\begin{align*} \sum_{x \in A} x = t \ . \end{align*}Let $P(S,t)$ be the predicate that is true if such a subset exists, and false otherwise. Here is a recursive way to evaluate $P$:
\begin{align*} P(S,t) & = \begin{cases} \texttt{true} & \textnormal{ if } t=0 \tag{just take $A = \emptyset$} \\ \texttt{false} & \textnormal{ if } t \ne 0 \textnormal{ and } S=\emptyset \\ P(S \setminus \{x\}, t) \vee P(S \setminus \{x\}, t-x) & \textnormal{ else, for some arbitrary $x \in S$. } \end{cases} \end{align*}The third line deserves some comments. If there is a set $A \subseteq S \setminus \{x\}$ with $\sum_{x \in A} = t$ then $P(A,t)$ is true adn $P(S,t)$ is true, too. If $\sum_{x \in A} = t-x$, then we can take the set $A \cup \{x\} \subseteq S$, and the sum of its elements is $t$, too.
The recursive formula for $P$ suggests a recursive brute-force algorithm with worst-case running time $2^n$. Since we allow the same integer to show up multiple times, we have to change terminology slightly: $S$ is a list $[x_1, \dots, x_n]$ of numbers, and we are looking for a set $I \subseteq [n]$ with $\sum_{i \in I} x_i = t$. We still call this problem Subset Sum instead of Sublist Sum, however.
Problem Write a recursive exponential program for subset sum. You can optimize a bit because we guarantee that all numbers be positive integers.
Input: three lines, the first containing a single integer $n \geq 1$; the second containing $n$ positive integers $x_1, x_2, \dots, x_n, t$ separated by spaces; the third containing a single integer $t \geq 0$.
Ouptut: a single line containing the word true or false, depending on whether there is such a sub-list of not.
Sample Input
9 1 2 3 8 9 4 4 5 6 20
Sample Output
true
Sample Input
7 2 4 6 8 10 12 14 9
Sample Output
false
Memoization
You probably wrote some recursive function like boolean subsetSum (int[] array, int target). A first step to improvement is to understand that array is always a prefix of the original array. Thus, we could just as well make array a global variable (in Java: a static class variable) and change the signature of the recursive function to
It does exactly the same as the above, just for originalArray[0:prefixLength] (I'm deliberately mixing Python and Java syntax here). Note that your function subsetSum(prefixLength, target) will still be called an exponential number of times in the worst case; however, how many input pairs (prefixLength, target) are possible? Obviously $0 \leq \texttt{prefixLength} \leq n$, and since we promised that no negative numbers occur, we also have $0 \leq \texttt{target} \leq \texttt{originalTarget}$. Thus, if $t$ is the original target, then there are at most $(n+1) \times (t+1)$ different inputs for subsetSum. The idea is now to create a hash table or any dictionary data structure and store those input pairs; if the current input is already in the table, we do not have to recursve: we can just look it up. If it isn't, we have to call the recursion, but then we store the result in the table.
Problem Write a program for Subset Sum using the concept of memoization.
Exercise Check your implementation against my test inputs, in particular the large ones.
Dynamic Programming
Dynamic programming is a realize that, once we have understood memoization, we can often completely do away with recursion. We store a two-dimensional array
We want to fill up isPossible[length][t] with $P$(array[0:length]$, t)$. Instead of recursing, we compute the table isPossible row by row, relying on the fact that table[length][t] can be computed from smaller values for length and t, and they have already been computed, so we can look them up in the table.
Problem Write a program for Subset Sum using the concept of dynamic programming.
Exercise Check your implementation against my test inputs, in particular the large ones, and compare the running time to the one of memoization.
For my instances, it turns out that memoization is faster. The reason is that of all $(n+1) \times ({\rm target}+1)$ theoretically possible input pairs $(length,t)$, only a fraction occurs in the recursion tree. Dynamic Programming will still compute the value for all pairs, including those not occurring in the recursion tree.
Memoization has the disadvantage that it still uses recursion, and the dictionary data structure causes additional overhead.