When I tell WebChanges to give me more results by clicking 100, 200, 500 etc, it gives me different
results. This seems to happen with SEARCH generally, at least when using
. The simplest way to describe it is that increasing
increases the number of things it finds
TWiki web with less than 300 topics. A team made possibly hundreds of edits to many dozens of topics in the space of a day, 18 Jun 2008. Going into WebChanges, they expected to see basically all displayed changes as being from 18 Jun 2008, but there were a few from today (19 Jun 2008), then several from 18 Jun 2008, then the rest were from before
So I've done some testing. I searched (CTL+F) for the string " -18 Jun 2008 - " on the WebChanges result (this matched the rev line for each result uniquely). Here's what I got:
- If there were 66 hits in reverse chronological order in a listing of 500, they should also show up in both the 100- and 200-limit searches. (When I did this testing, there were 17 results from today, the 19th.)
- In every case, results from before 18 Jun 2008 were displayed at the bottom, after the 18 June 2008 hits - even on the 50-limit search, which should have at least tailed out still displaying hits from this date.
- Also, the order changes. Eg, the first two topics displayed on the 50-limit search are in reverse order on the 500-limited search.
I've marked this "urgent" because it seems to indicate that the validity of search output is somewhat compromised.
If I say "give me the most recent 50 changes", I expect it to look at all the topics and give me the 50 most recently changed. Instead, it seems to be getting the 50 topics that it can grab without having to get out of its chair, and then listing them in reverse chronological order.
- 19 Jun 2008
mmm, I think
this is due to the non-obvious way that the search options are applied - I think limit happens first, then the sort. There was once an attempt to change that, but iirc there were some TWiki apps that didn't like the result.
I'm about to make some changes to the way that information is passed to the search and query backends - perhaps this will be helped by it.
damned good find!
- 19 Jun 2008
ah yes :/
# For performance:
# * sort by approx time (to get a rough list)
# * shorten list to the limit + some slack
# * sort by rev date on shortened list to get the accurate list
# SMELL: Ciaro had efficient two stage handling of modified sort.
# SMELL: In Dakar this seems to be pointless since latest rev
- 19 Jun 2008
This is a known problem all the way back to Cairo.
The problem manifests itself very badly if you have manipulated (edited or copied) topics in and out of a web via the file system.
I almost want to make a bet that this is what happened that day. I would not be surprised if the 100s of topics were manipulated directly on the file system or copied from another web.
TWiki generates the list of last modified by doing a pre-sort using the file time stamp instead of the timestamp in the topic. This gives a huge performance advantage when you have many topics in a web.
works pretty well and trying it here on Bugs I get valid results no matter what limit. But we have seen this problem when someone have hacked the files directly here.
In a future release we will have to revisit this. With an improved storage we can improve this a lot by having a simple DB (could be as simple as an additional file of topic and their time stamp in date order) that the modified criteria is taken from.
Marcus - put this back to urgent if I am wrong that the files were manipulated outside TWiki editing.
- 27 Jun 2008
They were edited in the normal way. As I mentioned above, the team using the web made possibly hundreds of edits to many dozens of topics in the space of a day, so it wasn't done at the filesystem level, it was done by the normal edit process through browsers.
I do sometimes do mass operations in the web at the filesystem level, but this problem was flagged by their team leader when they noticed that their edits weren't showing up as expected. I'm assuming that, even if the
files had been tinkered with directly, any subsequent "normal" TWiki editing would have "put things right"...?
I'm just thinking about why the system-level edits would make a difference... Even if they were edited at the system level (I tend to use
to do the bulk stuff), the timestamps should still line up the same way. Let's say I do a bulk edit on 300 topics to change permissions in
, and all the
files get a timestamp within a second or two of each other. If humans come along the next day and do a mass of edits, all those edits they made are still more recent than the system edits. Seems to me that the edit, whether done by a TWiki perl script or by me with
, is still an edit that changes the file at the given time. So, if TWiki uses the file timestamp instead of reading the serial date inside the file, it should be more
accurate in terms of showing what got edited last; any
sort of edit should be faithfully noted. Seems like there's something fishy going on here... Sven's comment above does seem to explain it.
Anyway, I'll mark it back to urgent as you advise. Cheers,
- 05 Jul 2008 (PS: Just noticed the signature block now leaves out the TWiki.org interwiki link - is that right?)
I still cannot reproduce this Marcus.
There is something you do not tell us here. There is an important detail that is left out. Can you create an example and attach the web here in a tar.gz?
Did you modify the WebChanges
- 13 Jul 2008
I tried hard to reproduce the issue in my development environment. Questions for you -TWiki:Main.MarcusLeonard
. Did you modify WebChanges topic? If yes - is it possible to add the content in this page?
. Kindly request you to attach the .tar.gz of the Web which is showing this issue?
Please let me know if any one able to reproduce this issue???
decided to downgrade this to normal as noone can reproduce this and reporter has not given the needed info to reproduce it.
- 22 Jul 2008
for sharing the details...
In fact I am able to recreate the bug on my sample instances....
My instance shows following - i have tested this on TWiki
|| Number of topics*
I am working on this.
- 01 Aug 2008
Would love a patch for this if you come up with one!
- 02 Aug 2008
First limiting and then sorting is sacrificing correctness for speed. This is a known trade-off in TWiki that has been made rather conscious. If you'd like correct sorting+limitting, have a look at DBCachePlugin
, where this tradeoff isn't needed anymore, and rewrite WebChanges
using an appropriate DBQUERY statement. That's the only solution we can offer right now without changing SEARCH in the core.
- 03 Aug 2008
Yikes. Okay... I'll look at some interim solutions, I guess. But can you tell me what the plan for this problem is in the longer term? It seems pretty serious to me; basically, SEARCH doesn't do what it says it does, or what people will think it does. Limiting-then-searching seems like a "lucky dip", because you won't necessarily know how the limiting will be applied. It reduces confidence in TWiki, which is not a good situation for something that's supposed to be a premier information management tool.
- 04 Aug 2008