$module Internals Description (document $Revision: 1.7 $)

This document provides a description of anything noteworthy in the workings of the system. See also the system's master QA document.

Initial Data

The data that it is assumed will be in the database, in addition to data in the System tables and settings table, is defined here.

This data can be modified and new records can be added, but doing so will have implications for site layout.

Section Table

Board Table

URL rewriting

Type Pattern Rewrite Alias
Read ^(/0/) /g/$1 none
Research ^(/2/) /g/$1 none
Study ^(/1/) /g/$1 none
Read/
Research/
Study Sections
^(/[012]/[0-9]/) /s/$1 none
Read/
Research/
Study Authors
^(/[012]/[0-9]+/[0-9]+/) /a/$1 none
Read/
Research/
Study Books
^(/[012]/[0-9]+/[0-9]+/[0-9+]/) /a/$1 none
Read/
Research
chapter pages
^(/[012]/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html) /a/$1 none
Study
chapter pages
^(/1/[0-9]+/[0-9]+/[0-9+]/[0-9]+/[0-9]+.html) /Login?continuationUrl=$1 /b/Login?continuationUrl=$1
Search ^(/4/([0-9-]+/)+) /Search/$1 /b/Search$1
Shop chapter pages ^(/3/([0-9-]+/)([0-9-]+/)([0-9-]+/)([0-9-]+/)+) /Shop/$1 /b/Shop$1
Shop books authors ^(/3/([0-9-]+/)+) /Shop/$1 /b/Shop$1
Trolley ^trolley(/method) /trolley/bibliomania/$1 /b/Trolley/bibliomania/$1
Boards ^/board/([0-9]+/method) /board/$1 /b/Board/$1
Messages ^/webmacro/MessagePage?db=paneris&id=([0-9]+/method) /message$1 /b/Message$1
Old URLs ^/(Fiction etc) /OldUrlRedirect$1 /b/OldUrlRedirect$1

Access Control

Access control is achieved using a cookie authentication scheme. In order to be able to read the protected content you have to know a certain random number (like 27835628), which is sent in by your browser with every request; you can only acquire the number by

At present the number isn't changed very often (in fact, only when the server is restarted), so the third option is theoretically feasible. We could make it harder by cancelling the number after a decent interval.

The second option is something out of the content-protection system's control. We need to be sure that the server is reasonably secure; if it isn't, no web-level access control is going to help.

The fourth access route is something that we just have to live with unless we go for a secure server. With credit card numbers obviously you have to do it. For our application it's overkill.

What you _can't_ do is look at the site in a vacuum and figure out how to access the protected content. And, it's much much easier just to register normally than to hack into the server, copy the magic number or snoop it, so in practice noone will bother with the latter.

Hints

System Design Assumptions and Parameters

A complex application inevitably has to make trade-offs under constraints as it reaches edges of the performance of its sub-components. These are listed here explicitly.

Search

Message 37898
 > 
 > Should we allow unlimited results?  What if someone searches the whole site 
 > and puts in 'the' .  IS there potential to clog up the system?

There is a silent, hard limit of 50 chapters per search for
essentially that reason.  Recall that we return the chapters hit in
"score" order (basically it likes more word/phrase occurrences rather
than fewer per chapter, and it likes them to be clustered together).
That means that we must in principle look at _every_ valid "hit", even
if the user only wants to see the "first" (most relevant) one.  As a
simple and guaranteed effective way of avoiding overload when all the
search terms entered are very common, we just stop scoring after we've
found 50 chapters.  If people see "at least 50 hits" and not the one
they want, they should know to start putting more discriminating
keywords in rather than laboriously paging through to the end.

The 10 and 5 shown on the search page itself are quite different
numbers.  They simply control how many of the occurrence contexts
within each chapter are displayed.  This actually makes little
difference to the load on the server.
Message 40180
tim@hoop.co.uk writes:
 > so we should do something in Author.delete(), so that we reindex before
 > deleting?
 > 
 > otherwise, we are going to have to periodically reindex?

Theoretically the current scheme does mean that when authors---in fact
any Chapters at all---are deleted, you get "orphaned" search hits in
the fti database which don't correspond to anything in the
Postgres/POEM database.  In practice, this effect is be irrelevant
since there are so few deletions, the phantom hits are silently
ignored (or they are now you have fixed the Author case), and the
unindex/reindex cycle happens anyway when the textids in question get
reused.  (That is to say, when a new text with the same author id
number, book-of-author sequence number, and chapter-of-book number
gets imported.)