|
|
 | | From: | Chris McDonald | | Subject: | Choice of data structure for large event queues? | | Date: | Thu, 23 Dec 2004 11:02:27 -0600 |
|
|
 | Hello, I have been implementing and measuring calendar-queues to manage time-ordered event queues in a simulation.
The basic functions of enqueuing and dequeuing work very well (very fast) for simulations involving thousands, even hundreds of thousands, of pending events.
However, my simulation also needs to occassionally, maybe 2% of the time, remove a "random" event from the queue - keeping the the queue balanced/ordered correctly for the more frequent enqueuing and dequeuing. Of course, with calendar-queues, this degenerates to linear behaviour - which is woeful.
Does anyone have suggestions for other data structures that I should examine? Thanks in advance,
______________________________________________________________________________ Dr Chris McDonald E: chris@csse.uwa.edu.au Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris The University of Western Australia, M002 T: +618 6488 2533 Crawley, Western Australia, 6009 F: +618 6488 1089
|
|
 | | From: | Thad Smith | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Thu, 23 Dec 2004 22:56:19 -0600 |
|
|
 | Chris McDonald wrote: > > Hello, I have been implementing and measuring calendar-queues to manage > time-ordered event queues in a simulation. > > The basic functions of enqueuing and dequeuing work very well (very fast) > for simulations involving thousands, even hundreds of thousands, > of pending events. > > However, my simulation also needs to occassionally, maybe 2% of the time, > remove a "random" event from the queue - keeping the the queue > balanced/ordered correctly for the more frequent enqueuing and dequeuing. > Of course, with calendar-queues, this degenerates to linear behaviour - > which is woeful.
I haven't used calendar queues, but did some reading from the net. How many buckets do you use for the events and what are the average number of elements per bucket? The cost of deleting an element should be basically the cost of finding the element to be deleted. If you have a pointer to the item to be deleted, then the cost is very small. If you have to search for it, then, to my understanding, finding the element is proportional to the number of items in the bucket, which is the same cost as choosing the minimum if the bucket is unsorted or doing an insertion if the bucket is sorted. Either way, the cost should be similar to insertion or minimum item removal.
Thad
|
|
 | | From: | Chris McDonald | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Sat, 25 Dec 2004 18:35:24 -0600 |
|
|
 | Thad Smith writes:
>I haven't used calendar queues, but did some reading from the net. >How many buckets do you use for the events and what are the average >number of elements per bucket? The cost of deleting an element should >be basically the cost of finding the element to be deleted. If you >have a pointer to the item to be deleted, then the cost is very >small. If you have to search for it, then, to my understanding, >finding the element is proportional to the number of items in the >bucket, which is the same cost as choosing the minimum if the bucket >is unsorted or doing an insertion if the bucket is sorted. Either >way, the cost should be similar to insertion or minimum item removal.
Thanks for the reply Thad. Here's my reply to another response to another reply, which may better explain my problem.
------- However, I probably didn't explain my problem well enough. I have hundreds or thousands of events to be maintained in time-sorted ordered (often termed a simulation's pending event set). Each event has a time value, used to (keep it) sorted in the queue, and one or more data fields, for example an integer part-number.
My problem is not in enqueuing or dequeuing, as that runs in O(1) time with the calendar queue. Similarly deleting a found event is easy - either mark it as a zombie and leave it in the queue, or adjust each bucket's queue and resize the whole calendar if necessary.
My problem is first finding an event with a given, say, part-number. I do not have the time of the event, and so must locate the required part-number with a linear search. Woeful when the queue holds tens of thousands of events, and a few thousand are to be processed per second.
Ideally I'd like to keep the calendar queue for the O(1) enqueuing and dequeuing, but will also need to "merge" it with another of data structure used to record the additional keys (of, of say, the part-numbers). I can anticipate with other fields are required for the deltions. As the random deltions (based on part-numbers) are far less frequent, something O(log n) or O(sqrt n) would be fine.
For many simulations I think this would be a relatively common requirement, so I wonder what others may use. Off to investigate threaded hash-tables for the secondary keys....
______________________________________________________________________________ Dr Chris McDonald E: chris@csse.uwa.edu.au Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris The University of Western Australia, M002 T: +618 6488 2533 Crawley, Western Australia, 6009 F: +618 6488 1089
|
|
 | | From: | Thad Smith | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Sun, 26 Dec 2004 18:23:05 -0600 |
|
|
 | Chris McDonald wrote:
> Thad Smith writes: > >>I haven't used calendar queues, but did some reading from the net. >>How many buckets do you use for the events and what are the average >>number of elements per bucket? The cost of deleting an element should >>be basically the cost of finding the element to be deleted. > > My problem is not in enqueuing or dequeuing, as that runs in O(1) time > with the calendar queue. Similarly deleting a found event is easy - > either mark it as a zombie and leave it in the queue, or adjust each > bucket's queue and resize the whole calendar if necessary. > > My problem is first finding an event with a given, say, part-number. I > do not have the time of the event, and so must locate the required > part-number with a linear search. Woeful when the queue holds tens of > thousands of events, and a few thousand are to be processed per second.
My recommendation would be to either 1) build an index of part numbers in the queue. The index would point to the queue entry. A linked hash table may work well. The queue entry should point to the hash entry for fast removal. I think that is what you suggested.
2) maintain a set of items to be deleted. As each item is removed from the queue, it is checked against the set. A tree, skip list, or linked hash table would probably work OK. If a hash table, make it larger than the expected number of deleted entries in queue to make searching fast.
In the second case the set is much smaller, since it contains only queued but deleted items, and would probably be faster if the deleted item check is fast.
BTW, this is an area outside my normal work. Can you explain how O(1) enqueuing and dequeuing is accomplished?
Thad
|
|
 | | From: | Martijn | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Tue, 28 Dec 2004 09:28:56 -0600 |
|
|
 | > 2) maintain a set of items to be deleted. As each item is removed > from the queue, it is checked against the set. A tree, skip list, or > linked hash table would probably work OK. If a hash table, make it > larger than the expected number of deleted entries in queue to make > searching fast.
This is what I would do, I guess. But 2% of a large number is still a pretty large number. Would binary trees not be an applicable solution for fast searching?
On antother node, if I understand correctly, each event "points" to a part. If a "random" event needs to be deleted, because something happened to that part, maybe it is best to store that information in the part itself. So mark a part as "zombie" or whatever and if necessary mark at what time this happened.
This would be harder to clean up, probably, but that can be solved by using yet another memory structure which contains the zombie'd part along with the time of the last event that is in the chain (that way you are sure that all pending events for that part are caught). There are different options for this data structure as well, but I would use something like a combination of a circular and expandable array (an array consiting of multiple smaller - but still rather large - arrays in memory, and making this construction circular - faster possibilities may exists which I have not experienced with yet or simply more suitable for your exact problem). This chain is automatically sorted by time due to its nature, so adding and removing items should only take O(1).
But I think it is almost inevitable that you are going to sacrifice memory over speed increase (given an "optimal" implementation of a certain algorithm).
Just my 2 cents...
-- Martijn http://www.sereneconcepts.nl
|
|
 | | From: | Martijn | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Wed, 29 Dec 2004 02:15:33 -0600 |
|
|
 | Sorry 'bout my post last night, I must have been so completely wasted my software even forgot to put in the signature! :P
-- Martijn http://www.sereneconcepts.nl
|
|
 | | From: | Nicolaas Vroom | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Sat, 08 Jan 2005 06:58:03 -0600 |
|
|
 | Hello Chris
Reading your signature file I would expect you are an expert in this subject and as such I have great doubts If I can be of any help, but still let me give you a try.
If I understand you correct your general problem is how to retrieve a particular record out of a collection of for example 100 thousand records in one random access file.
First let me describe the fields of each record in file 1: record #, record file 1 #, date, time, Juliandate, part number, description
Let me assume that new records are stored at the end of the file not sorted i.e. in random order as created. record # and record file 1 # are the same. Julian date is a real number, combination of date and time See for Details: Astronomy with your Personal computer by Peter Duffett Smith or Astronomical Algorithms by jean Meeus
Suppose you want to find a record with a particular Juliandate what is your best strategy IMO you should create the following sorted file: record #, record file 1 #, Juliandate, next record # This sorted file contains the Juliandates in ascending order. Initially next record # = record # +1 Suppose this file contains 100 sorted records and you want to add record 101. Record 101 should have been stored at record position 57. What do you do? Record 101 is stored at position 101. next record # of record 56 is changed from 57 to 101. next record # of record 101 is changed from 102 to 57
You can do the same for new records 102, 103, 104 etc The problem is this file contains still 100 sorted items but the tail of the file becomes messy and finding files in the tail becomes time consuming. The only solution is to do a clean up of this "sorted" file, (with a type of copy new and delete old operation) remove all holes and put the records in the tail in the correct order.
This clean up does not effect random access file 1
Suppose you also want to find a record with a particular part number what is your best strategy ? IMO you should than create the following sorted file: record #, record file 1 #, partnumber, next record # Adding a record is the same as with Juliandate except that the next record numbers calculated will be different because the ordered position will be different.
To find a particular Juliandate in the ordered Juliandate file is rather straight forward, of course I hope that those dates are equal distributed and are not clustered around certain specific values. (If requested I will give more details) The same for partnumbers or any key you want.
What is the language you are using ? Visual Basic ?
Hope this helps
Nicolaas Vroom http://users.pandora.be/nicvroom/
"Chris McDonald" schreef in bericht news:cql0tf$r7k$2@enyo.uwa.edu.au... > Thad Smith writes: > > >I haven't used calendar queues, but did some reading from the net. > >How many buckets do you use for the events and what are the average > >number of elements per bucket? The cost of deleting an element should > >be basically the cost of finding the element to be deleted. If you > >have a pointer to the item to be deleted, then the cost is very > >small. If you have to search for it, then, to my understanding, > >finding the element is proportional to the number of items in the > >bucket, which is the same cost as choosing the minimum if the bucket > >is unsorted or doing an insertion if the bucket is sorted. Either > >way, the cost should be similar to insertion or minimum item removal. > > Thanks for the reply Thad. > Here's my reply to another response to another reply, which may better > explain my problem. > > ------- > However, I probably didn't explain my problem well enough. I have > hundreds or thousands of events to be maintained in time-sorted ordered > (often termed a simulation's pending event set). > Each event has a time value, used to (keep it) sorted in the queue, and > one or more data fields, for example an integer part-number. > > My problem is not in enqueuing or dequeuing, as that runs in O(1) time > with the calendar queue. Similarly deleting a found event is easy - > either mark it as a zombie and leave it in the queue, or adjust each > bucket's queue and resize the whole calendar if necessary. > > My problem is first finding an event with a given, say, part-number. I > do not have the time of the event, and so must locate the required > part-number with a linear search. Woeful when the queue holds tens of > thousands of events, and a few thousand are to be processed per second. > > Ideally I'd like to keep the calendar queue for the O(1) enqueuing and > dequeuing, but will also need to "merge" it with another of data structure > used to record the additional keys (of, of say, the part-numbers). > I can anticipate with other fields are required for the deltions. > As the random deltions (based on part-numbers) are far less frequent, > something O(log n) or O(sqrt n) would be fine. > > For many simulations I think this would be a relatively common > requirement, so I wonder what others may use. Off to investigate threaded > hash-tables for the secondary keys.... > > ____________________________________________________________________________ __ > Dr Chris McDonald E: chris@csse.uwa.edu.au > Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris > The University of Western Australia, M002 T: +618 6488 2533 > Crawley, Western Australia, 6009 F: +618 6488 1089 >
|
|
 | | From: | moi | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Sun, 26 Dec 2004 17:31:52 -0600 |
|
|
 | Chris McDonald wrote:
> My problem is first finding an event with a given, say, part-number,I > do not have the time of the event, and so must locate the required > part-number with a linear search. Woeful when the queue holds tens of > thousands of events, and a few thousand are to be processed per second. >
So basically the event queue and the part-thingies are stored in different data structures, because they live in different worlds. - queue: sorted on time, contains enough 'key'-data to find the corresponding thingies.
- thingies: identifies by 'part-number'. they contain no foreign key.
My solution is simple: add a foreign-key field to the thingies. either (a pointer to) the queue-node or the 'time-scheduled' .
BTW if the 'part-number' does not change during flight, you could also choose some lineair adressing scheme.
Finally: My gutfeeling is that your queue is heterogenous, which could make a smart-union like solution possible.
> For many simulations I think this would be a relatively common > requirement, so I wonder what others may use. Off to investigate threaded > hash-tables for the secondary keys....
(in my view, the event-queue is the secondary key. The first beeing the'part-number' )
That would make the O a lot bigger. In a previous reply (which got lost), I suggested a _separate_ hash table for the events to drop on retrieval. Would be a lot leaner. (but kind of ugly ;-)
> ______________________________________________________________________________ > Dr Chris McDonald E: chris@csse.uwa.edu.au > Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris > The University of Western Australia, M002 T: +618 6488 2533 > Crawley, Western Australia, 6009 F: +618 6488 1089 >
HTH, AvK
|
|
 | | From: | moi | | Subject: | Re: Choice of data structure for large event queues? | | Date: | Sun, 26 Dec 2004 17:31:46 -0600 |
|
|
 | Thad Smith wrote:
> I haven't used calendar queues, but did some reading from the net. > How many buckets do you use for the events and what are the average > number of elements per bucket? The cost of deleting an element should > be basically the cost of finding the element to be deleted. If you > have a pointer to the item to be deleted, then the cost is very > small. If you have to search for it, then, to my understanding, > finding the element is proportional to the number of items in the > bucket, which is the same cost as choosing the minimum if the bucket > is unsorted or doing an insertion if the bucket is sorted. Either > way, the cost should be similar to insertion or minimum item removal. > > Thad >
I agree.
Some more suggestions: 1) create an 'invalid' field inside the node, test it on retrieving. this assumes you have a pointer to the node (or can find it), so it will perform the same as thad's suggestion. 2) add a prev-pointer to the node, or preferably a handle-field (pointer to the parent pointer). this will make insertions and retrievals costlyer.
3) make a separate "ignore" table (typically a hash). Instead of deleting the actual item, you remember to forget it. You do need to have good "keyfields", and retrieval will be a bit more costly.
NB I 'd never heard of the calendar queue before. At first sight it seemed to me thar the 'resize' operation is very costly. Could a more efficient 'rehash' be invented ?
HTH, AvK
|
|
|