Home | History | Annotate | Line # | Download | only in doc
bfdsumm.texi revision 1.1.1.2
      1 @c This summary of BFD is shared by the BFD and LD docs.
      2 @c Copyright 2012
      3 @c Free Software Foundation, Inc.
      4 
      5 When an object file is opened, BFD subroutines automatically determine
      6 the format of the input object file.  They then build a descriptor in
      7 memory with pointers to routines that will be used to access elements of
      8 the object file's data structures.
      9 
     10 As different information from the object files is required,
     11 BFD reads from different sections of the file and processes them.
     12 For example, a very common operation for the linker is processing symbol
     13 tables.  Each BFD back end provides a routine for converting
     14 between the object file's representation of symbols and an internal
     15 canonical format. When the linker asks for the symbol table of an object
     16 file, it calls through a memory pointer to the routine from the
     17 relevant BFD back end which reads and converts the table into a canonical
     18 form.  The linker then operates upon the canonical form. When the link is
     19 finished and the linker writes the output file's symbol table,
     20 another BFD back end routine is called to take the newly
     21 created symbol table and convert it into the chosen output format.
     22 
     23 @menu
     24 * BFD information loss::	Information Loss
     25 * Canonical format::		The BFD	canonical object-file format 
     26 @end menu
     27 
     28 @node BFD information loss
     29 @subsection Information Loss
     30 
     31 @emph{Information can be lost during output.} The output formats
     32 supported by BFD do not provide identical facilities, and
     33 information which can be described in one form has nowhere to go in
     34 another format. One example of this is alignment information in
     35 @code{b.out}. There is nowhere in an @code{a.out} format file to store
     36 alignment information on the contained data, so when a file is linked
     37 from @code{b.out} and an @code{a.out} image is produced, alignment
     38 information will not propagate to the output file. (The linker will
     39 still use the alignment information internally, so the link is performed
     40 correctly).
     41 
     42 Another example is COFF section names. COFF files may contain an
     43 unlimited number of sections, each one with a textual section name. If
     44 the target of the link is a format which does not have many sections (e.g.,
     45 @code{a.out}) or has sections without names (e.g., the Oasys format), the
     46 link cannot be done simply. You can circumvent this problem by
     47 describing the desired input-to-output section mapping with the linker command
     48 language.
     49 
     50 @emph{Information can be lost during canonicalization.} The BFD
     51 internal canonical form of the external formats is not exhaustive; there
     52 are structures in input formats for which there is no direct
     53 representation internally.  This means that the BFD back ends
     54 cannot maintain all possible data richness through the transformation
     55 between external to internal and back to external formats.
     56 
     57 This limitation is only a problem when an application reads one
     58 format and writes another.  Each BFD back end is responsible for
     59 maintaining as much data as possible, and the internal BFD
     60 canonical form has structures which are opaque to the BFD core,
     61 and exported only to the back ends. When a file is read in one format,
     62 the canonical form is generated for BFD and the application. At the
     63 same time, the back end saves away any information which may otherwise
     64 be lost. If the data is then written back in the same format, the back
     65 end routine will be able to use the canonical form provided by the
     66 BFD core as well as the information it prepared earlier.  Since
     67 there is a great deal of commonality between back ends,
     68 there is no information lost when
     69 linking or copying big endian COFF to little endian COFF, or @code{a.out} to
     70 @code{b.out}.  When a mixture of formats is linked, the information is
     71 only lost from the files whose format differs from the destination.
     72 
     73 @node Canonical format
     74 @subsection The BFD canonical object-file format
     75 
     76 The greatest potential for loss of information occurs when there is the least
     77 overlap between the information provided by the source format, that
     78 stored by the canonical format, and that needed by the
     79 destination format. A brief description of the canonical form may help
     80 you understand which kinds of data you can count on preserving across
     81 conversions.
     82 @cindex BFD canonical format
     83 @cindex internal object-file format
     84 
     85 @table @emph
     86 @item files
     87 Information stored on a per-file basis includes target machine
     88 architecture, particular implementation format type, a demand pageable
     89 bit, and a write protected bit.  Information like Unix magic numbers is
     90 not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
     91 file would have both the demand pageable bit and the write protected
     92 text bit set.  The byte order of the target is stored on a per-file
     93 basis, so that big- and little-endian object files may be used with one
     94 another.
     95 
     96 @item sections
     97 Each section in the input file contains the name of the section, the
     98 section's original address in the object file, size and alignment
     99 information, various flags, and pointers into other BFD data
    100 structures.
    101 
    102 @item symbols
    103 Each symbol contains a pointer to the information for the object file
    104 which originally defined it, its name, its value, and various flag
    105 bits.  When a BFD back end reads in a symbol table, it relocates all
    106 symbols to make them relative to the base of the section where they were
    107 defined.  Doing this ensures that each symbol points to its containing
    108 section.  Each symbol also has a varying amount of hidden private data
    109 for the BFD back end.  Since the symbol points to the original file, the
    110 private data format for that symbol is accessible.  @code{ld} can
    111 operate on a collection of symbols of wildly different formats without
    112 problems.
    113 
    114 Normal global and simple local symbols are maintained on output, so an
    115 output file (no matter its format) will retain symbols pointing to
    116 functions and to global, static, and common variables.  Some symbol
    117 information is not worth retaining; in @code{a.out}, type information is
    118 stored in the symbol table as long symbol names.  This information would
    119 be useless to most COFF debuggers; the linker has command line switches
    120 to allow users to throw it away.
    121 
    122 There is one word of type information within the symbol, so if the
    123 format supports symbol type information within symbols (for example, COFF,
    124 IEEE, Oasys) and the type is simple enough to fit within one word
    125 (nearly everything but aggregates), the information will be preserved.
    126 
    127 @item relocation level
    128 Each canonical BFD relocation record contains a pointer to the symbol to
    129 relocate to, the offset of the data to relocate, the section the data
    130 is in, and a pointer to a relocation type descriptor. Relocation is
    131 performed by passing messages through the relocation type
    132 descriptor and the symbol pointer. Therefore, relocations can be performed
    133 on output data using a relocation method that is only available in one of the
    134 input formats. For instance, Oasys provides a byte relocation format.
    135 A relocation record requesting this relocation type would point
    136 indirectly to a routine to perform this, so the relocation may be
    137 performed on a byte being written to a 68k COFF file, even though 68k COFF
    138 has no such relocation type.
    139 
    140 @item line numbers
    141 Object formats can contain, for debugging purposes, some form of mapping
    142 between symbols, source line numbers, and addresses in the output file.
    143 These addresses have to be relocated along with the symbol information.
    144 Each symbol with an associated list of line number records points to the
    145 first record of the list.  The head of a line number list consists of a
    146 pointer to the symbol, which allows finding out the address of the
    147 function whose line number is being described. The rest of the list is
    148 made up of pairs: offsets into the section and line numbers. Any format
    149 which can simply derive this information can pass it successfully
    150 between formats (COFF, IEEE and Oasys).
    151 @end table
    152