15-9 Breaking a string

A certain string-processing language allows a programmer to break a string into two pieces. Because this operation copies the string, it costs \(n\) time units to break a string of \(n\) characters into two pieces. Suppose a programmer wants to break a string into many pieces. The order in which the breaks occur can affect the total amount of time used. For example, suppose that the programmer wants to break a \(20\)-character string after characters \(2\), \(8\), and \(10\) (numbering the characters in ascending order from the left-hand end, starting from \(1\)). If she programs the breaks to occur in left-to-right order, then the first break costs \(20\) time units, the second break costs \(18\) time units (breaking the string from characters \(3\) to \(20\) at character \(8\)), and the third break costs \(12\) time units, totaling \(50\) time units. If she programs the breaks to occur in right-to-left order, however, then the first break costs \(20\) time units, the second break costs \(10\) time units, and the third break costs \(8\) time units, totaling \(38\) time units. In yet another order, she could break first at \(8\) (costing \(20\)), then break the left piece at \(2\) (costing \(8\)), and finally the right piece at \(10\) (costing \(12\)), for a total cost of \(40\).

Design an algorithm that, given the numbers of characters after which to break, determines a least-cost way to sequence those breaks. More formally, given a string \(S\) with \(n\) characters and an array \(L[1..m]\) containing the break points, com- pute the lowest cost for a sequence of breaks, along with a sequence of breaks that achieves this cost.

The subproblems will be indexed by contiguous subarrays of the arrays of cuts needed to be made. We try making each possible cut, and take the one with cheapest cost. Since there are \(m\) to try, and there are at most \(m^2\) possible things to index the subproblems with, we have that the m dependence is that the solution is \(O(m^3)\). Also, since each of the additions is of a number that is \(O(n)\), each of the iterations of the for loop may take time \(O(\lg n + \lg m)\), so, the final runtime is \(O(m^3 \lg n)\). The given algorithm will return \((cost, seq)\) where \(cost\) is the cost of the cheapest sequence, \(and\) seq is the sequence of cuts to make.

C++
1
2
3
4
5
6
7
8
9
CUT-STRING(L, i, j, l, r)
    if l == r
        return (0, [])
    minCost = 
    for k = i to j
        if l + r + CUT-STRING(L, i, k, l, L[k]).cost + CUT-STRING(L, k, j, L[k], j).cost < minCost
            minCost = r - l + CUT-STRING(L, i, k, l, L[k]).cost + CUT-STRING(L, k + 1, j, L[k], j).cost
            minSeq = L[k] + CUT-STRING(L, i, k, l, L[k]) + CUT-STRING(L, i, k + 1, l, L[k])
    return (minCost, minSeq)

Sample call: ``cpp L = [3, 8, 10] S = 20 CUT-STRING(L, 0, len(L), 0, s) ```