Essentially the method is this: cache and store the variable that is necessary for the branch.
There are three possible ways to do it:
1. Detect if there's a loop and use the equivalent of a lock micro-op to ensure the variable essential to the branch condition doesn't leave the cache until a context switch or if the loop is detected to have completed.
2. Rewrite the loop in micro-ops to have a smaller variable for the branch condition, and end the loop when completed.
3. Convert the branch prediction unit to another set of registers and have it operate as a cache for branching variables.