Science Fair Project Encyclopedia
Library (computer science)
In computer science, a library is a collection of subprograms used to develop software. Libraries are distinguished from executables in that they are not independent programs; rather, they are "helper" code that provides services to some other independent program. Today the vast majority of the code that executes in a typical application is located in the libraries it uses.
Well-known libraries include:
- LINPACK for solving linear matrix problems
- C standard library, whose implementations include glibc
- The Standard Template Library for C++
- Graphic libraries such as DirectX and OpenGL
The process of making resources available to other programs in a library is called exporting. Most common forms of exports include procedures (functions, routines, subroutines), variables, and some sorts of static data, e.g. icons. Exported procedures are also called entry points, because invoking them is akin to "entering" the library. In order to allow access to them, the resources receive names, which are written down inside a table, also containing their offsets inside the file. These names (and sometimes, by analogy, the resources they represent) are called symbols. Similarly, the table is called a symbol table. Executables are less likely to have a symbol table (they are not mandatory and are usually stripped down to save space). Aside from that, from most other aspects, the difference between libraries and executables in modern operating systems is limited.
Library linking describes the inclusion of one or more of these software libraries into a new program. There are multiple types of linking: static linking and dynamic linking. There are also different forms of dynamic libraries, shared or unshared. These are described below.
Static linking embeds a library directly into the program executable at compile time by a linker. A linker is a separate utility which takes one or more libraries and object files (which are previously generated by a compiler or an assembler) and produces an actual executable file. Internally all references to code located in the library are replaced with a pointer to the code; since it cannot move independently of the executable, this fixed offset will never change.
One of the biggest disadvantages of static linking is that each executable ends up containing its own copy of the library. When many statically linked programs using the same library are simultaneously executed on the same machine, a great deal of memory can be wasted, as each execution loads its own copy of the library's data into memory.
Another disadvantage is that newer versions of the library need to be re-compiled into the executable. A newer version of the library included in program X doesn't help program Y; both must be upgraded separately.
Examples of libraries which are traditionally designed to be statically linked include the ANSI C standard library and the ALIB assembler library . Static linked libraries predate Fortran; Fortran's I/O was designed to use a preexisting package of I/O routines.
Dynamic linking systems place the majority of the linker code in the underlying operating system, in which case it is known as a loader. At compile time the linker only writes down what libraries the executable needs and checks to make sure they are being called properly. When that program is then executed, the loader finds these libraries and links them at that point, either at loadtime or during runtime when the library is actually referenced. The result is called a dynamically linked library, sometimes refered to as a DLL much due to the fact that dynamic libraries on Microsoft Windows are using the filename extension .dll.
In a dynamic library the location of the actual code is unknown until after it has been loaded into memory. This means that storing the location of the code in the executable itself is impossible. It would be possible to examine the program at load time and replace all references with pointers once the location is known, but this is theoretically a time consuming process. Instead, most dynamic library systems include a table of the code being called that is linked into the program at compile time. This table, the import directory, is in a known location that the executable code is linked to. At load time the table is modified with the location of the library code by the loader/linker.
The library itself contains a table of all the methods within it, known as entry points. Calls into the library "jump through" this table, looking up the location of the code in memory, then calling it. This introduces a small overhead in call time, but one that is so small as to be ignorable.
The exact time that the library is loaded into memory varies from system to system. In some, including Windows and Linux, all linking takes place when the executable is first loaded. This type of dynamic linking is called loadtime linking.
Other operating systems resolve dependencies at runtime. In these systems the executable calls an operating system API, passing it the name of a library file, a function number within the library and the function's parameters. The operating system resolves the import immediately and calls the appropriate function on behalf of the application. This type of dynamic linking is called runtime linking. Because of the overhead added to each call, runtime linking is slow and negatively affects an executable's performance. As a result, runtime linking is rarely used by modern operating systems.
Dynamic linkers/loaders vary widely in functionality. Some write down explicit paths to the libraries, based on some "known" location for library storage. Any change to the library naming or layout of the filesystem will cause these systems to fail. More commonly only the name of the library itself is stored, with the operating system supplying a system to find the library on-disk based on some algorithm. Unix-based systems use a PATH variable of "places to look", which tends to be robust as the PATH rarely changes. On the downside this forces developers to place their libraries in one of several "known" locations, which tend to fill up and make management more complex. On Microsoft systems the PATH variable is also used, but only after checking the current working directory, the directory set by SetDllDirectory(), and the system32, system and windows directories. However, often times libraries are stored outside of these locations, so the registry is used to determine the correct location. OpenStep used a more flexible system, collecting up a list of libraries from a number of known locations (similar to the PATH concept) when the system first starts. Moving libraries around causes no problems at all, although there is a time cost when first starting the system.
One of the largest disadvantages of dynamic linking is that the executables depend on the separately stored libraries in order to function properly. If the library is deleted, moved, renamed or replaced with an incompatible version, the executable could malfunction. On Windows this is commonly known as DLL hell.
Dynamic linking libraries date back to at least MTS (the Michigan Terminal System), built in the late 60s. ("A History of MTS", Information Technology Digest, Vol. 5, No. 5)
Additionally, a library may be loaded dynamically during the execution of a program, as opposed to when the program is loaded to main memory or started from main memory. The loading of the library is thus delayed until it is needed, and if it is never needed, it is never loaded. Such a library is referred to as a dynamically loaded library (DLL) - different from Windows-type DLL. This form of library is typically used for plug-in modules and interpreters needing to load certain functionality on demand.
Most systems supporting dynamic libraries also support dynamic loading via API in the underlying operating system. Under some systems the programmer must be careful to ensure the library is loaded before calling it, while others automate this process as well. Internally the differences are invisible, libraries loaded either way are handled identically.
Another solution to the library issue is to use completely separate executables (often in some lightweight form) and call them using a remote procedure call (RPC). This approach maximized operating system re-use: the code needed to support the library is the same code being used to provide application support and security for every other program. Additionally, such systems do not require the library to exist on the same machine, but can forward the requests over the network.
The downside to such an approach is that every library call requires a considerable amount of overhead. RPC calls are generally very expensive, and often avoided where possible. Nevertheless this approach has become popular in a number of domain-specific areas, notable client-server systems and application servers such as Java EnterpriseBeans .
In addition to being loaded statically or dynamically, libraries are also often classified according to how they are shared among programs. Dynamic libraries almost always offer some form of sharing, allowing the same library to be used by multiple programs at the same time. Static libraries, by definition, cannot be shared; they are linked into each program.
The shared library term is slightly ambiguous, because it covers at least two different concepts. First, it is the sharing of code located on disk by unrelated programs. The second concept is the sharing of code in memory, when programs execute the same physical page of RAM, mapped into different address spaces. It would seem that the latter would be preferrable, and indeed it has a number of advantages. For instance on the OpenStep system, applications were often only a few hundred kilobytes in size and loaded almost instantly; the vast majority of their code was located in libraries that had already been loaded for other purposes by the operating system. There is a cost, however; shared code must be specifically written to run in a multitasking environment, and this has effects on performance.
RAM sharing can be accomplished by using position independent code as in Unix, which leads to a complex but flexible architecture, or by using normal, ie. not position independent code as in Windows and OS/2. These systems make sure, by various tricks like pre-mapping the address space and reserving slots for each DLL, that code has a great probability of being shared. Windows DLLs are not shared libraries in the Unix sense. The rest of this article concentrates on aspects common to both variants.
In most modern operating systems, shared libraries can be of the same format as the "regular" executables. This allows two main advantages: first, it requires making only one loader for both of them, rather than two. The added complexity of the one loader is considered well worth the cost. Secondly, it allows the executables also to be used as DLLs, if they have a symbol table. Typical executable/DLL formats are ELF (Unix) and PE (Windows). In Windows, the concept was taken one step further, with even system resources such as fonts being bundled in the DLL file format. The same is true under OpenStep, where the universal "bundle" format is used for almost all system resources.
The term DLL is mostly used on Windows and OS/2 products. On Unix platforms,the term shared library is more commonly used. This is technically justified in view of the different semantics. More explanations are available in the position independent code article.
In some cases, an operating system can become overloaded with different versions of DLLs, which impedes its performance and stability. Such a scenario is known as DLL-hell.
Dynamic linking developed during the late 1980s and was generally available in some form in most operating systems by the early 1990s. It was during the same period that object-oriented programming (OOP) was first making its way into the programming market. OOP requires additional information that traditional libraries don't supply; in addition to the names and entry points of the code located within, they also require a list of the objects they depend on. This is a side-effect of one of OOP's main advantages, inheritance, which means that the complete definition of any method may be defined in a number of places. This is more than simply listing that one library requires the services of another, in a true OOPs system, the libraries themselves may not be known at compile time, and vary from system to system.
At the same time another common area for development was the idea of multi-tier programs, in which a "display" running on a desktop computer would use the services of a mainframe or minicomputer for data storage or processing. For instance, a program on a GUI-based computer would send messages to a minicomputer to return small samples of a huge dataset for display. Remote procedure calls already handled these tasks, but there was no standard RPC system.
It was not long before the majority of the mini/mainframe vendors were working on projects to combine the two, producing an OOPs library format that could be used anywhere. Such systems were known as object libraries, or distributed objects if they supported remote access (not all did). Microsoft's COM is an example of such a system for local use, DCOM a modified version that support remote access.
For some time object libraries were the "next big thing" in the programming world. There were a number of efforts to create systems that would run across platforms, and companies competed to try to get developers locked into their own system. Examples include IBM's System Object Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable Distributed Objects (PDO), Digital's ObjectBroker, Microsoft's Component Object Model (COM/DCOM), and any number of CORBA-based systems.
In the end, it turned out that OOP libraries were not the next big thing. With the exception of Microsoft's COM and NeXT's (now Apple Computer) DO, all of these efforts have since ended.
- GNU/Linux, Solaris and BSD variants:
libfoo.sofiles are placed in folders like
/usr/local/libare dynamically linked libraries. The filenames always start with
lib, and end with
.a(archive, static library) or
.so(shared object, dynamically linked library), with an optional interface number. For example
libfoo.so.2is the second major interface revision of the dynamically linked library
libfoo. Old Unix versions would use major and minor library revision numbers (
libfoo.so.1.2) while contemporary Unixes will only use major revision numbers (
libfoo.so.1). Dynamically loaded libraries are placed in
/usr/libexecand similar directories.
- Mac OS X and upwards: libraries are named
libfoo.dylib, with an optional interface number, such as
libfoo.2.dylib. These are often stored inside a wrapper directory, named
- Microsoft Windows:
*.LIBfiles are statically linkable libraries and
*.DLLfiles are dynamically linkable libraries. The interface revisions are encoded in the files, or abstracted away using COM-object interfaces.
- Program Library HOWTO for GNU/Linux
- Dynamic Linking and Loading
- Article "Faster C++ program startups by improving runtime linking efficiency" by Léon Bottou and John Ryland
- LIB BFD - the Binary File Descriptor Library
- PE Explorer
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details