
693 if (dynxdfs[1].size() >= window_size) { 694 do_diff(); (gdb) 695 co_yield get_diff_for(l1, l2); 696 // if l1 or l2 overflows this likely means that file 1 was exhausted while there were still matching values in file 2 for some reason. maybe they weren't passed through nreff? 697 while (l1) { 698 dynxdfs[0].consume(); 699 -- l1; 700 }
so i'm in the first consume call, and the first thing i'm noticing is that, because only one line has been processed, the current document diff has deletions (changes, it has a nice general representation where deletions and additions are the same), marked after the first line (gdb) p xdf->rchg[0] $7 = 0 '\000' (gdb) p xdf->rchg[1] $8 = 1 '\001' (gdb) p xdf->rchg[2] $9 = 1 '\001' this is the same as the first incorrect output that was generated, 1 equality and 2 deletions. lemme look at the other document (gdb) up #1 0x0000555555611ee0 in AsymmetricStreamingXDiff::diff(_ZN24AsymmetricStreamingXDiff4diffEN4zinc9generatorISt17basic_string_viewIcSt11char_traitsIcEES5_NS0_17use_allocator_argEEE.Frame *) (frame_ptr=0x5120000001c0) at diff_xdiff.cpp:698 698 dynxdfs[0].consume(); (gdb) p dynxdfs[1].xdf->nrec $11 = 1 (gdb) p dynxdfs[1].rchg[0] $12 = (char &) @0x502000000290: 0 '\000' the second document has only one line and it's equal (although i'd have to check nreff and rindex and such to know for sure, it's likely) (gdb) p l1 $13 = 1 (gdb) p l2 $14 = 1 and only 1 line has been processed from each document so i kind of expect this to go well. i'm curious what variables like these look like before the next do_diff . i'll continue to do_diff's prolog. (gdb) break do_diff Note: breakpoints 1 and 8 also set at pc 0x55555563b3c3. Breakpoint 9 at 0x55555563b3c3: file diff_xdiff.cpp, line 841. (gdb) cont Continuing. Breakpoint 3, DynamicXDFile::consume (this=0x7ffff4e00c80, lines=1) at diff_xdiff.cpp:457 457 xdf->dstart += lines; (gdb) cont Continuing. consuming rec ptr 0x502000000250 WINDOW RESIZE2=>4 Breakpoint 1, AsymmetricStreamingXDiff::do_diff (this=0x7ffff4e00a30) at diff_xdiff.cpp:841 841 auto xe = &this->xe; ok ummm check variables i tried to call trace_state but it didn't work: (gdb) p dynxdfs[0].trace_state("0"), dynxdfs[1].trace_state("1") Cannot resolve method DynamicXDFile::trace_state to any overloaded instance (gdb) p dynxdfs[0].trace_state(std::string_view("0",1)), dynxdfs[1].trace_state(std::string_view("1",1)) A syntax error in expression, near `"0",1)), dynxdfs[1].trace_state(std::string_view("1",1))'. (gdb) p dynxdfs[0].trace_state(string_view("0",1)), dynxdfs[1].trace_state(string_view("1",1)) A syntax error in expression, near `("0",1)), dynxdfs[1].trace_state(string_view("1",1))'. i found the bug though below :D (gdb) p xe->xdf1.nreff $15 = 2 (gdb) p xe->xdf2.nreff $16 = 1 (gdb) p xe->xdf1.nrec $17 = 2 (gdb) p xe->xdf2.nrec $18 = 1 (gdb) p xe->xdf1.rec[0].ptr There is no member named rec. (gdb) p xe->xdf1.recs[0].ptr $19 = 0x7ffff4e00a02 "b\nc\n" (gdb) p xe->xdf2.recs[0].ptr $20 = 0x502000000312 "b\n" (gdb) p xe->xdf1.recs[0].ha $21 = 1 (gdb) p xe->xdf1.recs[1].ha $22 = 2 oooooooops nope i didn't at all! i thought i was checking xdf2 but instead i checked line 2 of xdf1 >( (gdb) p xe->xdf2.recs[0].ha $23 = 1 so line 1 of both files have the same hash index. and all the lines are in nreff ... (gdb) p xe->xdf1.rindex[0] $24 = 0 (gdb) p xe->xdf2.rindex[0] $25 = 0 a new diff should count them as equal, unchanged >( shouldn't it?? (gdb) p xe->xdf1.rchg[0] $26 = 1 '\001' (gdb) p xe->xdf2.rchg[0] $27 = 0 '\000' it is notable that file 1 still has a change marked for the line, from before it was in the window. this algorithm, from xdiff and git, wasn't designed to be used with a window -- usually when rchg is set it is because of a preprocessing phase that eliminates lines that can be processed quickly. however, which ones are removed are tracked with rindex and nreff and ha ... hey! i should check the ha array (gdb) p xe->xdf1.ha[0] $28 = 1 (gdb) p xe->xdf2.ha[0] $29 = 1 ok, so the lines are tracked as having the same hash index in the preprocessed ha lists too. so i dunno why this is happening, and i'm wondering if it could be because of some check in the diff implementation that skips setting rchg if it already is. this codebase has had many people working on it over the years and different approaches to things have accumulated. but now i get to step into the git xdiff code and see !!