`$module` Internals Description (document $Revision: 1.7 $)

This document provides a description of anything noteworthy in the workings of the system. See also the system's master QA document.

Initial Data

The data that it is assumed will be in the database, in addition to data in the System tables and settings table, is defined here.

This data can be modified and new records can be added, but doing so will have implications for site layout.

Section Table

Read
Study
Research
Shop
Search

Board Table

Comments - General comments/expressions of joy about Bibliomania
Classics - Tell fellow Bibliomaniacs your favourite classic texts
Shop - Discuss what authors/books you would like to see in the shop
Teachers - Share experiences/resources in teaching english, literature and e-education. Press: Recent press coverage and articles General Revision/Help: General revision/help questions (Author/text specific questions should be posted to the author/text boards)
Technical- Discussion of the technical side of Bibliomania

URL rewriting

Type	Pattern	Rewrite	Alias
Read	^(/0/)	/g/$1	none
Research	^(/2/)	/g/$1	none
Study	^(/1/)	/g/$1	none
Read/ Research/ Study Sections	^(/[012]/[0-9]/)	/s/$1	none
Read/ Research/ Study Authors	^(/[012]/[0-9]+/[0-9]+/)	/a/$1	none
Read/ Research/ Study Books	^(/[012]/[0-9]+/[0-9]+/[0-9+]/)	/a/$1	none
Read/ Research chapter pages	^(/[012]/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html)	/a/$1	none
Study chapter pages	^(/1/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html)	/Login?continuationUrl=$1	/b/Login?continuationUrl=$1
Search	^(/4/([0-9-]+/)+)	/Search/$1	/b/Search$1
Shop chapter pages	^(/3/([0-9-]+/)([0-9-]+/)([0-9-]+/)([0-9-]+/)+)	/Shop/$1	/b/Shop$1
Shop books authors	^(/3/([0-9-]+/)+)	/Shop/$1	/b/Shop$1
Trolley	^trolley(/method)	/trolley/bibliomania/$1	/b/Trolley/bibliomania/$1
Boards	^/board/([0-9]+/method)	/board/$1	/b/Board/$1
Messages	^/webmacro/MessagePage?db=paneris&id=([0-9]+/method)	/message$1	/b/Message$1
Old URLs	^/(Fiction etc)	/OldUrlRedirect$1	/b/OldUrlRedirect$1

Access Control

Access control is achieved using a cookie authentication scheme. In order to be able to read the protected content you have to know a certain random number (like 27835628), which is sent in by your browser with every request; you can only acquire the number by

logging in correctly
hacking into the server
getting it off an existing user by extracting it from their (running) browser
snooping it off the public network, over which it is sent in plain text

At present the number isn't changed very often (in fact, only when the server is restarted), so the third option is theoretically feasible. We could make it harder by cancelling the number after a decent interval.

The second option is something out of the content-protection system's control. We need to be sure that the server is reasonably secure; if it isn't, no web-level access control is going to help.

The fourth access route is something that we just have to live with unless we go for a secure server. With credit card numbers obviously you have to do it. For our application it's overkill.

What you _can't_ do is look at the site in a vacuum and figure out how to access the protected content. And, it's much much easier just to register normally than to hack into the server, copy the magic number or snoop it, so in practice noone will bother with the latter.

Hints

Nofrontpage vs Nonstandard
Not quite; the effect of nofrontpage is to cause all URLs referring to the book to be turned to URLs for chapter 1 page 1. The effect of author.nonstandard is to cause it not to try to use the usual Author.wm template during the encache, falling back on an index.wm which you provide in the source dir.

System Design Assumptions and Parameters

A complex application inevitably has to make trade-offs under constraints as it reaches edges of the performance of its sub-components. These are listed here explicitly.

Search

Message 37898
> > Should we allow unlimited results? What if someone searches the whole site > and puts in 'the' . IS there potential to clog up the system? There is a silent, hard limit of 50 chapters per search for essentially that reason. Recall that we return the chapters hit in "score" order (basically it likes more word/phrase occurrences rather than fewer per chapter, and it likes them to be clustered together). That means that we must in principle look at _every_ valid "hit", even if the user only wants to see the "first" (most relevant) one. As a simple and guaranteed effective way of avoiding overload when all the search terms entered are very common, we just stop scoring after we've found 50 chapters. If people see "at least 50 hits" and not the one they want, they should know to start putting more discriminating keywords in rather than laboriously paging through to the end. The 10 and 5 shown on the search page itself are quite different numbers. They simply control how many of the occurrence contexts within each chapter are displayed. This actually makes little difference to the load on the server.

Message 37898

 > 
 > Should we allow unlimited results?  What if someone searches the whole site 
 > and puts in 'the' .  IS there potential to clog up the system?

There is a silent, hard limit of 50 chapters per search for
essentially that reason.  Recall that we return the chapters hit in
"score" order (basically it likes more word/phrase occurrences rather
than fewer per chapter, and it likes them to be clustered together).
That means that we must in principle look at _every_ valid "hit", even
if the user only wants to see the "first" (most relevant) one.  As a
simple and guaranteed effective way of avoiding overload when all the
search terms entered are very common, we just stop scoring after we've
found 50 chapters.  If people see "at least 50 hits" and not the one
they want, they should know to start putting more discriminating
keywords in rather than laboriously paging through to the end.

The 10 and 5 shown on the search page itself are quite different
numbers.  They simply control how many of the occurrence contexts
within each chapter are displayed.  This actually makes little
difference to the load on the server.

Message 40180
tim@hoop.co.uk writes: > so we should do something in Author.delete(), so that we reindex before > deleting? > > otherwise, we are going to have to periodically reindex? Theoretically the current scheme does mean that when authors---in fact any Chapters at all---are deleted, you get "orphaned" search hits in the fti database which don't correspond to anything in the Postgres/POEM database. In practice, this effect is be irrelevant since there are so few deletions, the phantom hits are silently ignored (or they are now you have fixed the Author case), and the unindex/reindex cycle happens anyway when the textids in question get reused. (That is to say, when a new text with the same author id number, book-of-author sequence number, and chapter-of-book number gets imported.)

Message 40180

tim@hoop.co.uk writes:
 > so we should do something in Author.delete(), so that we reindex before
 > deleting?
 > 
 > otherwise, we are going to have to periodically reindex?

Theoretically the current scheme does mean that when authors---in fact
any Chapters at all---are deleted, you get "orphaned" search hits in
the fti database which don't correspond to anything in the
Postgres/POEM database.  In practice, this effect is be irrelevant
since there are so few deletions, the phantom hits are silently
ignored (or they are now you have fixed the Author case), and the
unindex/reindex cycle happens anyway when the textids in question get
reused.  (That is to say, when a new text with the same author id
number, book-of-author sequence number, and chapter-of-book number
gets imported.)

$module Internals Description (document $Revision: 1.7 $)