Science Fair Projects Ideas - External sorting

All Science Fair Projects

      

Science Fair Project Encyclopedia for Schools!

  Search    Browse    Forum  Coach    Links    Editor    Help    Tell-a-Friend    Encyclopedia    Dictionary     

Science Fair Project Encyclopedia

For information on any area of science that interests you,
enter a keyword (eg. scientific method, molecule, cloud, carbohydrate etc.).
Or else, you can start by choosing any of the categories below.

External sorting

External sorting is a generic term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted does not fit into the main memory of a computing device (usually RAM) and a slower kind of memory (usually a hard drive) needs to be used.

One example of external sorting is the external mergesort algorithm. For the sake of clarity, let us assume that 900 megabytes of data needs to be sorted using only 100 megabytes of RAM.

  1. Read 100MB of the data in main memory and sort by some conventional method (usually quicksort).
  2. Write the sorted data to disk.
  3. Repeat steps 1 and 2 until all of the data is sorted in chunks of 100MB. Now you need to merge them into one single sorted output file.
  4. Read the first 10MB of each sorted chunk (call them input buffers) in main memory (90MB total) and allocate the remaining 10MB for output buffer.
  5. Perform a 9-way merging and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10MB of its associated 100MB sorted chunk or otherwise mark it as exhausted if there is no more data in the sorted chunk and do not use it for merging.

The above described algorithm can be generalized by assuming that the amount of data to be sorted exceeds the available memory by a factor of K. Then, K chunks of data need to be sorted and a K-way merge has to be completed. If X is the amount of main memory available, there will be K input buffers and 1 output buffer of size X/(K+1) each. Depending on various factors (how fast the hard drive is, what is the value of K) better performance can be achieved if the output buffer is made larger (for example twice as large as one input buffer).

In the example, a single-pass merge was used. If the ratio of data to available main memory is particularly large, a multi-pass sorting is preferable. For example, merge only the first half of the sorted chunks, then the other half and now the problem has been reduced to merging just two sorted chunks. The exact number of passes depends on the above mentioned ratio, as well as the physical characteristics of the hard drive (transfer rate and seeking time). As a rule of thumb, do not perform more than 20-30 way merge.

If carefully implemented, external sorting can be done in-place (with no additional disk space required).

A description of the unix 'Sort' command

A quicksort example

09-23-2007 01:00:40
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details
Science kits, science lessons, science toys, maths toys, hobby kits, science games and books - these are some of many products that can help give your kid an edge in their science fair projects, and develop a tremendous interest in the study of science. When shopping for a science kit or other supplies, make sure that you carefully review the features and quality of the products. Compare prices by going to several online stores. Read product reviews online or refer to magazines.

Start by looking for your science kit review or science toy review. Compare prices but remember, Price $ is not everything. Quality does matter.
Science Fair Coach
What do science fair judges look out for?
ScienceHound
Science Fair Projects for students of all ages
All Science Fair Projects.com Site
All Science Fair Projects Homepage
Search | Browse | Links | From-our-Editor | Books | Help | Contact | Privacy | Disclaimer | Copyright Notice