inetbot web crawler
Main  |  Get access to the repository  |  API  |  The robot  |  Publications  |  Usenet Groups  |  Plainweb  | 
 inetbot - Groups (beta)

Current group: comp.theory

Is stright indexing taking up too much space?

Is stright indexing taking up too much space?  
jblanch
From:jblanch
Subject:Is stright indexing taking up too much space?
Date:23 Jan 2005 10:56:32 -0800
I've been wondering about how google/other serach engines store their
indexes. I know google had the programming contest where one person
decided to try and store entries with similar topics together, and then
had them indexed.

Of course the internet is growing day by day, and the number of people
that have their own webisites with "unique" information is amazingly
high. Googles index is already above 8 billion, so what if there was a
way to cut that down even more? Who knows, the next techonoligical
advancement could lead to their being 20 billion indexable pages, that
might put alot of possibly non-nessicary work on googles computers.

Does anyone have any other ideas to storing and indexing pages? Maybe
grouping by topic? subject? or just start mass indexing only XML feeds?

I think it would be cool to see sombody come out with a new break to
the index method.
   

Copyright © 2006 inetbot   -   All rights reserved