Our scraping experience suggests we should distribute the function across all servers and move slowly enough that site owners can direct the spider's progress.
Search operates by accumulating information by scraping and delivering it by query. We will consider both in turn.
# Page Scrape
A page viewed scrapes its neighbors. We examine the sites mentioned in any page and go fetch their sitemaps. This is the scrape step. It need not be directed by readers.
We suggest that every server should scrape the neighborhood of the pages it serves before they are viewed by readers.
A server could scrape further to the neighborhood of its neighbors or beyond. For a small server with few sites and few pages within those sites a deeper search might make sense, or not.
We've seen the utility of searching scraped sites for many properties from words present to plugins in use. But we find query for items of common origin to be most interesting and also one of the easiest queries to pose.
Show me pages that share this page's history. Show me pages with ids I have here. Show me where this came from and where it is going. Show me more.
A page could be adorned with a more query in the space remaining after we add licence, json and site of residence. A link to 'more' would ask the server hosting the page in question to show us more based on page ids.
We suggested that the scrape of any given server may not need to go beyond the sites found on its own pages. We suggest that is true because the curious reader can extend this frontier at will by saving even one page from the beyond. The server's next scrape which could be only minutes away will now include pages from the site now brought within the server's natural neighborhood.
When we exploit unbounded search to provide symmetrical links between pages we expose ourselves to "follower spam" as one now finds on facebook and twitter. The unbounded link store provided by unrestricted search exhibits this vulnerability. Our practical desire to limit the depth of automatic search is exactly the protection we need so long as there is a human mediated mechanism for extending these limits.
Should we make farm servers responsible for exploring the server-visible neighborhoods described here then we will find that we have placed additional trust on those writing within the farm. Farms then must protect themselves against the bad actor who would trick the server's search to extend into undesirable neighborhoods.
A site operator will need the ability to expel any site that fails to operate within the best interests of the others and to thereby expunge bad neighbors from its search. Site operators will then become the judges who in their small realm must distinguish the progressive from the subversive, the griefers from the good.
The intellectual health of the federation and the culture it supports then depends on careful admission of new authors into farms hosted and paid for with a purpose. Should these become a network of ingroups then we will face an arms race between progressives and griefers which I expect the progressives will win.
Search Thoughts describes two search cases that intersect in curious ways. Both could be improved by the locality suggested here.
# See also