Master the Longest Unique Substring Problem

Solving the problem of finding the longest substring with unique characters can be a challenging task. The initial approach might lead you into loops of confusion and inefficiency. Even when seeking solutions online, deciphering the provided solutions can be a daunting task. In this article, we'll break down this problem and introduce a clear and efficient solution to help you understand and master it.

Let's begin by considering an example to understand the problem better. Given the string "abacdba," the longest substring with unique characters could be "bacd," "acdb," or "cdba." The length of our longest substring is 4.

Your initial, perhaps intuitive, approach might involve looping through each character until a duplicate is found. You may record the length of the substring before encountering a duplicate and then move to the next character. This naive approach results in an exponential runtime, O(n^2), which is far from optimal.

To tackle this problem efficiently, we'll employ a sliding window approach. This technique uses two pointers: a starting pointer and a leading pointer. Additionally, we maintain a record of the longest unique substring encountered so far, which initially starts at 0. A lookup table is used to keep track of repeating characters.

The sliding window approach begins with both pointers at the beginning of the string, index 0. We add the current character to the lookup table and move the leading pointer to the next character. At each step, we check if the character has been previously recorded in the lookup table.

If the character is not in the lookup table, we record it and continue. If it already exists, we need to adjust the starting pointer to a position just ahead of the duplicate character. This ensures that the substring remains unique.

Suppose our string starts with 'b' at position 1 and ends with 'b' at position 7. We move the starting pointer to position 2 and maintain the endpoint at position 7. We then compare the current substring's length to our longest recorded substring by calculating the difference between the current and starting positions.

To handle duplicates that might occur behind our starting position, we ensure that the starting position never moves backward. Therefore, we compare the maximum value between the starting position and the duplicate character's index to determine the new starting position.

Here's our solution in Python:

def longest_unique_substring(string: str) -> int:
    start, longest, current = 0, 0, 0
    lookup = {}

    while current < len(string):
        if string[current] in lookup: # check if there is a duplicate character
            position_infront_of_duplicate = lookup[string[current]] + 1
            start = max(start, position_infront_of_duplicate)
        lookup[string[current]] = current # update lookup table
        current_length = current - start + 1
        longest = max(longest, current_length) # update longest value
        current += 1
    return longest

Understanding and mastering the problem of finding the longest substring with unique characters is a great opportunity to understand the sliding window approach which is used to solve many other algorithmic problems.