Home | History | Annotate | Line # | Download | only in doc
      1       1.1  christos @c This summary of BFD is shared by the BFD and LD docs.
      2  1.1.1.10  christos @c Copyright (C) 2012-2025 Free Software Foundation, Inc.
      3   1.1.1.2  christos 
      4       1.1  christos When an object file is opened, BFD subroutines automatically determine
      5       1.1  christos the format of the input object file.  They then build a descriptor in
      6       1.1  christos memory with pointers to routines that will be used to access elements of
      7       1.1  christos the object file's data structures.
      8       1.1  christos 
      9       1.1  christos As different information from the object files is required,
     10       1.1  christos BFD reads from different sections of the file and processes them.
     11       1.1  christos For example, a very common operation for the linker is processing symbol
     12       1.1  christos tables.  Each BFD back end provides a routine for converting
     13       1.1  christos between the object file's representation of symbols and an internal
     14       1.1  christos canonical format. When the linker asks for the symbol table of an object
     15       1.1  christos file, it calls through a memory pointer to the routine from the
     16       1.1  christos relevant BFD back end which reads and converts the table into a canonical
     17       1.1  christos form.  The linker then operates upon the canonical form. When the link is
     18       1.1  christos finished and the linker writes the output file's symbol table,
     19       1.1  christos another BFD back end routine is called to take the newly
     20       1.1  christos created symbol table and convert it into the chosen output format.
     21       1.1  christos 
     22       1.1  christos @menu
     23       1.1  christos * BFD information loss::	Information Loss
     24       1.1  christos * Canonical format::		The BFD	canonical object-file format 
     25       1.1  christos @end menu
     26       1.1  christos 
     27       1.1  christos @node BFD information loss
     28       1.1  christos @subsection Information Loss
     29       1.1  christos 
     30       1.1  christos @emph{Information can be lost during output.} The output formats
     31       1.1  christos supported by BFD do not provide identical facilities, and
     32       1.1  christos information which can be described in one form has nowhere to go in
     33       1.1  christos another format. One example of this is alignment information in
     34       1.1  christos @code{b.out}. There is nowhere in an @code{a.out} format file to store
     35       1.1  christos alignment information on the contained data, so when a file is linked
     36       1.1  christos from @code{b.out} and an @code{a.out} image is produced, alignment
     37       1.1  christos information will not propagate to the output file. (The linker will
     38       1.1  christos still use the alignment information internally, so the link is performed
     39       1.1  christos correctly).
     40       1.1  christos 
     41       1.1  christos Another example is COFF section names. COFF files may contain an
     42       1.1  christos unlimited number of sections, each one with a textual section name. If
     43       1.1  christos the target of the link is a format which does not have many sections (e.g.,
     44       1.1  christos @code{a.out}) or has sections without names (e.g., the Oasys format), the
     45       1.1  christos link cannot be done simply. You can circumvent this problem by
     46       1.1  christos describing the desired input-to-output section mapping with the linker command
     47       1.1  christos language.
     48       1.1  christos 
     49       1.1  christos @emph{Information can be lost during canonicalization.} The BFD
     50       1.1  christos internal canonical form of the external formats is not exhaustive; there
     51       1.1  christos are structures in input formats for which there is no direct
     52       1.1  christos representation internally.  This means that the BFD back ends
     53       1.1  christos cannot maintain all possible data richness through the transformation
     54       1.1  christos between external to internal and back to external formats.
     55       1.1  christos 
     56       1.1  christos This limitation is only a problem when an application reads one
     57       1.1  christos format and writes another.  Each BFD back end is responsible for
     58       1.1  christos maintaining as much data as possible, and the internal BFD
     59       1.1  christos canonical form has structures which are opaque to the BFD core,
     60       1.1  christos and exported only to the back ends. When a file is read in one format,
     61       1.1  christos the canonical form is generated for BFD and the application. At the
     62       1.1  christos same time, the back end saves away any information which may otherwise
     63       1.1  christos be lost. If the data is then written back in the same format, the back
     64       1.1  christos end routine will be able to use the canonical form provided by the
     65       1.1  christos BFD core as well as the information it prepared earlier.  Since
     66       1.1  christos there is a great deal of commonality between back ends,
     67       1.1  christos there is no information lost when
     68       1.1  christos linking or copying big endian COFF to little endian COFF, or @code{a.out} to
     69       1.1  christos @code{b.out}.  When a mixture of formats is linked, the information is
     70       1.1  christos only lost from the files whose format differs from the destination.
     71       1.1  christos 
     72       1.1  christos @node Canonical format
     73       1.1  christos @subsection The BFD canonical object-file format
     74       1.1  christos 
     75       1.1  christos The greatest potential for loss of information occurs when there is the least
     76       1.1  christos overlap between the information provided by the source format, that
     77       1.1  christos stored by the canonical format, and that needed by the
     78       1.1  christos destination format. A brief description of the canonical form may help
     79       1.1  christos you understand which kinds of data you can count on preserving across
     80       1.1  christos conversions.
     81       1.1  christos @cindex BFD canonical format
     82       1.1  christos @cindex internal object-file format
     83       1.1  christos 
     84       1.1  christos @table @emph
     85       1.1  christos @item files
     86       1.1  christos Information stored on a per-file basis includes target machine
     87       1.1  christos architecture, particular implementation format type, a demand pageable
     88       1.1  christos bit, and a write protected bit.  Information like Unix magic numbers is
     89       1.1  christos not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
     90       1.1  christos file would have both the demand pageable bit and the write protected
     91       1.1  christos text bit set.  The byte order of the target is stored on a per-file
     92       1.1  christos basis, so that big- and little-endian object files may be used with one
     93       1.1  christos another.
     94       1.1  christos 
     95       1.1  christos @item sections
     96       1.1  christos Each section in the input file contains the name of the section, the
     97       1.1  christos section's original address in the object file, size and alignment
     98       1.1  christos information, various flags, and pointers into other BFD data
     99       1.1  christos structures.
    100       1.1  christos 
    101       1.1  christos @item symbols
    102       1.1  christos Each symbol contains a pointer to the information for the object file
    103       1.1  christos which originally defined it, its name, its value, and various flag
    104       1.1  christos bits.  When a BFD back end reads in a symbol table, it relocates all
    105       1.1  christos symbols to make them relative to the base of the section where they were
    106       1.1  christos defined.  Doing this ensures that each symbol points to its containing
    107       1.1  christos section.  Each symbol also has a varying amount of hidden private data
    108       1.1  christos for the BFD back end.  Since the symbol points to the original file, the
    109       1.1  christos private data format for that symbol is accessible.  @code{ld} can
    110       1.1  christos operate on a collection of symbols of wildly different formats without
    111       1.1  christos problems.
    112       1.1  christos 
    113       1.1  christos Normal global and simple local symbols are maintained on output, so an
    114       1.1  christos output file (no matter its format) will retain symbols pointing to
    115       1.1  christos functions and to global, static, and common variables.  Some symbol
    116       1.1  christos information is not worth retaining; in @code{a.out}, type information is
    117       1.1  christos stored in the symbol table as long symbol names.  This information would
    118   1.1.1.6  christos be useless to most COFF debuggers; the linker has command-line switches
    119       1.1  christos to allow users to throw it away.
    120       1.1  christos 
    121       1.1  christos There is one word of type information within the symbol, so if the
    122       1.1  christos format supports symbol type information within symbols (for example, COFF,
    123   1.1.1.6  christos Oasys) and the type is simple enough to fit within one word
    124       1.1  christos (nearly everything but aggregates), the information will be preserved.
    125       1.1  christos 
    126       1.1  christos @item relocation level
    127       1.1  christos Each canonical BFD relocation record contains a pointer to the symbol to
    128       1.1  christos relocate to, the offset of the data to relocate, the section the data
    129       1.1  christos is in, and a pointer to a relocation type descriptor. Relocation is
    130       1.1  christos performed by passing messages through the relocation type
    131       1.1  christos descriptor and the symbol pointer. Therefore, relocations can be performed
    132       1.1  christos on output data using a relocation method that is only available in one of the
    133       1.1  christos input formats. For instance, Oasys provides a byte relocation format.
    134       1.1  christos A relocation record requesting this relocation type would point
    135       1.1  christos indirectly to a routine to perform this, so the relocation may be
    136       1.1  christos performed on a byte being written to a 68k COFF file, even though 68k COFF
    137       1.1  christos has no such relocation type.
    138       1.1  christos 
    139       1.1  christos @item line numbers
    140       1.1  christos Object formats can contain, for debugging purposes, some form of mapping
    141       1.1  christos between symbols, source line numbers, and addresses in the output file.
    142       1.1  christos These addresses have to be relocated along with the symbol information.
    143       1.1  christos Each symbol with an associated list of line number records points to the
    144       1.1  christos first record of the list.  The head of a line number list consists of a
    145       1.1  christos pointer to the symbol, which allows finding out the address of the
    146       1.1  christos function whose line number is being described. The rest of the list is
    147       1.1  christos made up of pairs: offsets into the section and line numbers. Any format
    148       1.1  christos which can simply derive this information can pass it successfully
    149   1.1.1.6  christos between formats.
    150       1.1  christos @end table
    151