Home | History | Annotate | Line # | Download | only in doc
      1 @section coff backends
      2 BFD supports a number of different flavours of coff format.
      3 The major differences between formats are the sizes and
      4 alignments of fields in structures on disk, and the occasional
      5 extra field.
      6 
      7 Coff in all its varieties is implemented with a few common
      8 files and a number of implementation specific files. For
      9 example, the i386 coff format is implemented in the file
     10 @file{coff-i386.c}.  This file @code{#include}s
     11 @file{coff/i386.h} which defines the external structure of the
     12 coff format for the i386, and @file{coff/internal.h} which
     13 defines the internal structure. @file{coff-i386.c} also
     14 defines the relocations used by the i386 coff format
     15 @xref{Relocations}.
     16 
     17 @subsection Porting to a new version of coff
     18 The recommended method is to select from the existing
     19 implementations the version of coff which is most like the one
     20 you want to use.  For example, we'll say that i386 coff is
     21 the one you select, and that your coff flavour is called foo.
     22 Copy @file{i386coff.c} to @file{foocoff.c}, copy
     23 @file{../include/coff/i386.h} to @file{../include/coff/foo.h},
     24 and add the lines to @file{targets.c} and @file{Makefile.in}
     25 so that your new back end is used. Alter the shapes of the
     26 structures in @file{../include/coff/foo.h} so that they match
     27 what you need. You will probably also have to add
     28 @code{#ifdef}s to the code in @file{coff/internal.h} and
     29 @file{coffcode.h} if your version of coff is too wild.
     30 
     31 You can verify that your new BFD backend works quite simply by
     32 building @file{objdump} from the @file{binutils} directory,
     33 and making sure that its version of what's going on and your
     34 host system's idea (assuming it has the pretty standard coff
     35 dump utility, usually called @code{att-dump} or just
     36 @code{dump}) are the same.  Then clean up your code, and send
     37 what you've done to Cygnus. Then your stuff will be in the
     38 next release, and you won't have to keep integrating it.
     39 
     40 @subsection How the coff backend works
     41 
     42 
     43 @subsubsection File layout
     44 The Coff backend is split into generic routines that are
     45 applicable to any Coff target and routines that are specific
     46 to a particular target.  The target-specific routines are
     47 further split into ones which are basically the same for all
     48 Coff targets except that they use the external symbol format
     49 or use different values for certain constants.
     50 
     51 The generic routines are in @file{coffgen.c}.  These routines
     52 work for any Coff target.  They use some hooks into the target
     53 specific code; the hooks are in a @code{bfd_coff_backend_data}
     54 structure, one of which exists for each target.
     55 
     56 The essentially similar target-specific routines are in
     57 @file{coffcode.h}.  This header file includes executable C code.
     58 The various Coff targets first include the appropriate Coff
     59 header file, make any special defines that are needed, and
     60 then include @file{coffcode.h}.
     61 
     62 Some of the Coff targets then also have additional routines in
     63 the target source file itself.
     64 
     65 @subsubsection Coff long section names
     66 In the standard Coff object format, section names are limited to
     67 the eight bytes available in the @code{s_name} field of the
     68 @code{SCNHDR} section header structure.  The format requires the
     69 field to be NUL-padded, but not necessarily NUL-terminated, so
     70 the longest section names permitted are a full eight characters.
     71 
     72 The Microsoft PE variants of the Coff object file format add
     73 an extension to support the use of long section names.  This
     74 extension is defined in section 4 of the Microsoft PE/COFF
     75 specification (rev 8.1).  If a section name is too long to fit
     76 into the section header's @code{s_name} field, it is instead
     77 placed into the string table, and the @code{s_name} field is
     78 filled with a slash ("/") followed by the ASCII decimal
     79 representation of the offset of the full name relative to the
     80 string table base.
     81 
     82 Note that this implies that the extension can only be used in object
     83 files, as executables do not contain a string table.  The standard
     84 specifies that long section names from objects emitted into executable
     85 images are to be truncated.
     86 
     87 However, as a GNU extension, BFD can generate executable images
     88 that contain a string table and long section names.  This
     89 would appear to be technically valid, as the standard only says
     90 that Coff debugging information is deprecated, not forbidden,
     91 and in practice it works, although some tools that parse PE files
     92 expecting the MS standard format may become confused; @file{PEview} is
     93 one known example.
     94 
     95 The functionality is supported in BFD by code implemented under
     96 the control of the macro @code{COFF_LONG_SECTION_NAMES}.  If not
     97 defined, the format does not support long section names in any way.
     98 If defined, it is used to initialise a flag,
     99 @code{_bfd_coff_long_section_names}, and a hook function pointer,
    100 @code{_bfd_coff_set_long_section_names}, in the Coff backend data
    101 structure.  The flag controls the generation of long section names
    102 in output BFDs at runtime; if it is false, as it will be by default
    103 when generating an executable image, long section names are truncated;
    104 if true, the long section names extension is employed.  The hook
    105 points to a function that allows the value of a copy of the flag
    106 in coff object tdata to be altered at runtime, on formats that
    107 support long section names at all; on other formats it points
    108 to a stub that returns an error indication.
    109 
    110 With input BFDs, the flag is set according to whether any long section
    111 names are detected while reading the section headers.  For a completely
    112 new BFD, the flag is set to the default for the target format.  This
    113 information can be used by a client of the BFD library when deciding
    114 what output format to generate, and means that a BFD that is opened
    115 for read and subsequently converted to a writeable BFD and modified
    116 in-place will retain whatever format it had on input.
    117 
    118 If @code{COFF_LONG_SECTION_NAMES} is simply defined (blank), or is
    119 defined to the value "1", then long section names are enabled by
    120 default; if it is defined to the value zero, they are disabled by
    121 default (but still accepted in input BFDs).  The header @file{coffcode.h}
    122 defines a macro, @code{COFF_DEFAULT_LONG_SECTION_NAMES}, which is
    123 used in the backends to initialise the backend data structure fields
    124 appropriately; see the comments for further detail.
    125 
    126 @subsubsection Bit twiddling
    127 Each flavour of coff supported in BFD has its own header file
    128 describing the external layout of the structures. There is also
    129 an internal description of the coff layout, in
    130 @file{coff/internal.h}. A major function of the
    131 coff backend is swapping the bytes and twiddling the bits to
    132 translate the external form of the structures into the normal
    133 internal form. This is all performed in the
    134 @code{bfd_swap}_@i{thing}_@i{direction} routines. Some
    135 elements are different sizes between different versions of
    136 coff; it is the duty of the coff version specific include file
    137 to override the definitions of various packing routines in
    138 @file{coffcode.h}. E.g., the size of line number entry in coff is
    139 sometimes 16 bits, and sometimes 32 bits. @code{#define}ing
    140 @code{PUT_LNSZ_LNNO} and @code{GET_LNSZ_LNNO} will select the
    141 correct one. No doubt, some day someone will find a version of
    142 coff which has a varying field size not catered to at the
    143 moment. To port BFD, that person will have to add more @code{#defines}.
    144 Three of the bit twiddling routines are exported to
    145 @code{gdb}; @code{coff_swap_aux_in}, @code{coff_swap_sym_in}
    146 and @code{coff_swap_lineno_in}. @code{GDB} reads the symbol
    147 table on its own, but uses BFD to fix things up.  More of the
    148 bit twiddlers are exported for @code{gas};
    149 @code{coff_swap_aux_out}, @code{coff_swap_sym_out},
    150 @code{coff_swap_lineno_out}, @code{coff_swap_reloc_out},
    151 @code{coff_swap_filehdr_out}, @code{coff_swap_aouthdr_out},
    152 @code{coff_swap_scnhdr_out}. @code{Gas} currently keeps track
    153 of all the symbol table and reloc drudgery itself, thereby
    154 saving the internal BFD overhead, but uses BFD to swap things
    155 on the way out, making cross ports much safer.  Doing so also
    156 allows BFD (and thus the linker) to use the same header files
    157 as @code{gas}, which makes one avenue to disaster disappear.
    158 
    159 @subsubsection Symbol reading
    160 The simple canonical form for symbols used by BFD is not rich
    161 enough to keep all the information available in a coff symbol
    162 table. The back end gets around this problem by keeping the original
    163 symbol table around, "behind the scenes".
    164 
    165 When a symbol table is requested (through a call to
    166 @code{bfd_canonicalize_symtab}), a request gets through to
    167 @code{coff_get_normalized_symtab}. This reads the symbol table from
    168 the coff file and swaps all the structures inside into the
    169 internal form. It also fixes up all the pointers in the table
    170 (represented in the file by offsets from the first symbol in
    171 the table) into physical pointers to elements in the new
    172 internal table. This involves some work since the meanings of
    173 fields change depending upon context: a field that is a
    174 pointer to another structure in the symbol table at one moment
    175 may be the size in bytes of a structure at the next.  Another
    176 pass is made over the table. All symbols which mark file names
    177 (@code{C_FILE} symbols) are modified so that the internal
    178 string points to the value in the auxent (the real filename)
    179 rather than the normal text associated with the symbol
    180 (@code{".file"}).
    181 
    182 At this time the symbol names are moved around. Coff stores
    183 all symbols less than nine characters long physically
    184 within the symbol table; longer strings are kept at the end of
    185 the file in the string table. This pass moves all strings
    186 into memory and replaces them with pointers to the strings.
    187 
    188 The symbol table is massaged once again, this time to create
    189 the canonical table used by the BFD application. Each symbol
    190 is inspected in turn, and a decision made (using the
    191 @code{sclass} field) about the various flags to set in the
    192 @code{asymbol}.  @xref{Symbols}. The generated canonical table
    193 shares strings with the hidden internal symbol table.
    194 
    195 Any linenumbers are read from the coff file too, and attached
    196 to the symbols which own the functions the linenumbers belong to.
    197 
    198 @subsubsection Symbol writing
    199 Writing a symbol to a coff file which didn't come from a coff
    200 file will lose any debugging information. The @code{asymbol}
    201 structure remembers the BFD from which the symbol was taken, and on
    202 output the back end makes sure that the same destination target as
    203 source target is present.
    204 
    205 When the symbols have come from a coff file then all the
    206 debugging information is preserved.
    207 
    208 Symbol tables are provided for writing to the back end in a
    209 vector of pointers to pointers. This allows applications like
    210 the linker to accumulate and output large symbol tables
    211 without having to do too much byte copying.
    212 
    213 This function runs through the provided symbol table and
    214 patches each symbol marked as a file place holder
    215 (@code{C_FILE}) to point to the next file place holder in the
    216 list. It also marks each @code{offset} field in the list with
    217 the offset from the first symbol of the current symbol.
    218 
    219 Another function of this procedure is to turn the canonical
    220 value form of BFD into the form used by coff. Internally, BFD
    221 expects symbol values to be offsets from a section base; so a
    222 symbol physically at 0x120, but in a section starting at
    223 0x100, would have the value 0x20. Coff expects symbols to
    224 contain their final value, so symbols have their values
    225 changed at this point to reflect their sum with their owning
    226 section.  This transformation uses the
    227 @code{output_section} field of the @code{asymbol}'s
    228 @code{asection} @xref{Sections}.
    229 
    230 @itemize @bullet
    231 
    232 @item
    233 @code{coff_mangle_symbols}
    234 @end itemize
    235 This routine runs though the provided symbol table and uses
    236 the offsets generated by the previous pass and the pointers
    237 generated when the symbol table was read in to create the
    238 structured hierarchy required by coff. It changes each pointer
    239 to a symbol into the index into the symbol table of the asymbol.
    240 
    241 @itemize @bullet
    242 
    243 @item
    244 @code{coff_write_symbols}
    245 @end itemize
    246 This routine runs through the symbol table and patches up the
    247 symbols from their internal form into the coff way, calls the
    248 bit twiddlers, and writes out the table to the file.
    249 
    250 @findex coff_symbol_type
    251 @subsubsection @code{coff_symbol_type}
    252 The hidden information for an @code{asymbol} is described in a
    253 @code{combined_entry_type}:
    254 
    255 
    256 @example
    257 typedef struct coff_ptr_struct
    258 @{
    259   /* Remembers the offset from the first symbol in the file for
    260      this symbol.  Generated by coff_renumber_symbols.  */
    261   unsigned int offset;
    262 
    263   /* Selects between the elements of the union below.  */
    264   unsigned int is_sym : 1;
    265 
    266   /* Selects between the elements of the x_sym.x_tagndx union.  If set,
    267      p is valid and the field will be renumbered.  */
    268   unsigned int fix_tag : 1;
    269 
    270   /* Selects between the elements of the x_sym.x_fcnary.x_fcn.x_endndx
    271      union.  If set, p is valid and the field will be renumbered.  */
    272   unsigned int fix_end : 1;
    273 
    274   /* Selects between the elements of the x_csect.x_scnlen union.  If set,
    275      p is valid and the field will be renumbered.  */
    276   unsigned int fix_scnlen : 1;
    277 
    278   /* If set, u.syment.n_value contains a pointer to a symbol.  The final
    279      value will be the offset field.  Used for XCOFF C_BSTAT symbols.  */
    280   unsigned int fix_value : 1;
    281 
    282   /* If set, u.syment.n_value is an index into the line number entries.
    283      Used for XCOFF C_BINCL/C_EINCL symbols.  */
    284   unsigned int fix_line : 1;
    285 
    286   /* The container for the symbol structure as read and translated
    287      from the file.  */
    288   union
    289   @{
    290     union internal_auxent auxent;
    291     struct internal_syment syment;
    292   @} u;
    293 
    294  /* An extra pointer which can used by format based on COFF (like XCOFF)
    295     to provide extra information to their backend.  */
    296  void *extrap;
    297 @} combined_entry_type;
    298 
    299 /* Each canonical asymbol really looks like this: */
    300 
    301 typedef struct coff_symbol_struct
    302 @{
    303   /* The actual symbol which the rest of BFD works with */
    304   asymbol symbol;
    305 
    306   /* A pointer to the hidden information for this symbol */
    307   combined_entry_type *native;
    308 
    309   /* A pointer to the linenumber information for this symbol */
    310   struct lineno_cache_entry *lineno;
    311 
    312   /* Have the line numbers been relocated yet ? */
    313   bool done_lineno;
    314 @} coff_symbol_type;
    315 
    316 @end example
    317 @findex bfd_coff_backend_data
    318 @subsubsection @code{bfd_coff_backend_data}
    319 
    320 @example
    321 typedef struct
    322 @{
    323   void (*_bfd_coff_swap_aux_in)
    324     (bfd *, void *, int, int, int, int, void *);
    325 
    326   void (*_bfd_coff_swap_sym_in)
    327     (bfd *, void *, void *);
    328 
    329   void (*_bfd_coff_swap_lineno_in)
    330     (bfd *, void *, void *);
    331 
    332   unsigned int (*_bfd_coff_swap_aux_out)
    333     (bfd *, void *, int, int, int, int, void *);
    334 
    335   unsigned int (*_bfd_coff_swap_sym_out)
    336     (bfd *, void *, void *);
    337 
    338   unsigned int (*_bfd_coff_swap_lineno_out)
    339     (bfd *, void *, void *);
    340 
    341   unsigned int (*_bfd_coff_swap_reloc_out)
    342     (bfd *, void *, void *);
    343 
    344   unsigned int (*_bfd_coff_swap_filehdr_out)
    345     (bfd *, void *, void *);
    346 
    347   unsigned int (*_bfd_coff_swap_aouthdr_out)
    348     (bfd *, void *, void *);
    349 
    350   unsigned int (*_bfd_coff_swap_scnhdr_out)
    351     (bfd *, void *, void *);
    352 
    353   unsigned int _bfd_filhsz;
    354   unsigned int _bfd_aoutsz;
    355   unsigned int _bfd_scnhsz;
    356   unsigned int _bfd_symesz;
    357   unsigned int _bfd_auxesz;
    358   unsigned int _bfd_relsz;
    359   unsigned int _bfd_linesz;
    360   unsigned int _bfd_filnmlen;
    361   bool _bfd_coff_long_filenames;
    362 
    363   bool _bfd_coff_long_section_names;
    364   bool (*_bfd_coff_set_long_section_names)
    365     (bfd *, int);
    366 
    367   unsigned int _bfd_coff_default_section_alignment_power;
    368   bool _bfd_coff_force_symnames_in_strings;
    369   unsigned int _bfd_coff_debug_string_prefix_length;
    370   unsigned int _bfd_coff_max_nscns;
    371 
    372   void (*_bfd_coff_swap_filehdr_in)
    373     (bfd *, void *, void *);
    374 
    375   void (*_bfd_coff_swap_aouthdr_in)
    376     (bfd *, void *, void *);
    377 
    378   void (*_bfd_coff_swap_scnhdr_in)
    379     (bfd *, void *, void *);
    380 
    381   void (*_bfd_coff_swap_reloc_in)
    382     (bfd *abfd, void *, void *);
    383 
    384   bool (*_bfd_coff_bad_format_hook)
    385     (bfd *, void *);
    386 
    387   bool (*_bfd_coff_set_arch_mach_hook)
    388     (bfd *, void *);
    389 
    390   void * (*_bfd_coff_mkobject_hook)
    391     (bfd *, void *, void *);
    392 
    393   bool (*_bfd_styp_to_sec_flags_hook)
    394     (bfd *, void *, const char *, asection *, flagword *);
    395 
    396   void (*_bfd_set_alignment_hook)
    397     (bfd *, asection *, void *);
    398 
    399   bool (*_bfd_coff_slurp_symbol_table)
    400     (bfd *);
    401 
    402   bool (*_bfd_coff_symname_in_debug)
    403     (bfd *, struct internal_syment *);
    404 
    405   bool (*_bfd_coff_pointerize_aux_hook)
    406     (bfd *, combined_entry_type *, combined_entry_type *,
    407      unsigned int, combined_entry_type *);
    408 
    409   bool (*_bfd_coff_print_aux)
    410     (bfd *, FILE *, combined_entry_type *, combined_entry_type *,
    411      combined_entry_type *, unsigned int);
    412 
    413   bool (*_bfd_coff_reloc16_extra_cases)
    414     (bfd *, struct bfd_link_info *, struct bfd_link_order *, arelent *,
    415      bfd_byte *, size_t *, size_t *);
    416 
    417   int (*_bfd_coff_reloc16_estimate)
    418     (bfd *, asection *, arelent *, unsigned int,
    419      struct bfd_link_info *);
    420 
    421   enum coff_symbol_classification (*_bfd_coff_classify_symbol)
    422     (bfd *, struct internal_syment *);
    423 
    424   bool (*_bfd_coff_compute_section_file_positions)
    425     (bfd *);
    426 
    427   bool (*_bfd_coff_start_final_link)
    428     (bfd *, struct bfd_link_info *);
    429 
    430   bool (*_bfd_coff_relocate_section)
    431     (bfd *, struct bfd_link_info *, bfd *, asection *, bfd_byte *,
    432      struct internal_reloc *, struct internal_syment *, asection **);
    433 
    434   reloc_howto_type *(*_bfd_coff_rtype_to_howto)
    435     (bfd *, asection *, struct internal_reloc *,
    436      struct coff_link_hash_entry *, struct internal_syment *, bfd_vma *);
    437 
    438   bool (*_bfd_coff_adjust_symndx)
    439     (bfd *, struct bfd_link_info *, bfd *, asection *,
    440      struct internal_reloc *, bool *);
    441 
    442   bool (*_bfd_coff_link_output_has_begun)
    443     (bfd *, struct coff_final_link_info *);
    444 
    445   bool (*_bfd_coff_final_link_postscript)
    446     (bfd *, struct coff_final_link_info *);
    447 
    448   bool (*_bfd_coff_print_pdata)
    449     (bfd *, void *);
    450 
    451 @} bfd_coff_backend_data;
    452 
    453 @end example
    454 @subsubsection Writing relocations
    455 To write relocations, the back end steps though the
    456 canonical relocation table and create an
    457 @code{internal_reloc}. The symbol index to use is removed from
    458 the @code{offset} field in the symbol table supplied.  The
    459 address comes directly from the sum of the section base
    460 address and the relocation offset; the type is dug directly
    461 from the howto field.  Then the @code{internal_reloc} is
    462 swapped into the shape of an @code{external_reloc} and written
    463 out to disk.
    464 
    465 @subsubsection Reading linenumbers
    466 Creating the linenumber table is done by reading in the entire
    467 coff linenumber table, and creating another table for internal use.
    468 
    469 A coff linenumber table is structured so that each function
    470 is marked as having a line number of 0. Each line within the
    471 function is an offset from the first line in the function. The
    472 base of the line number information for the table is stored in
    473 the symbol associated with the function.
    474 
    475 Note: The PE format uses line number 0 for a flag indicating a
    476 new source file.
    477 
    478 The information is copied from the external to the internal
    479 table, and each symbol which marks a function is marked by
    480 pointing its...
    481 
    482 How does this work ?
    483 
    484 @subsubsection Reading relocations
    485 Coff relocations are easily transformed into the internal BFD form
    486 (@code{arelent}).
    487 
    488 Reading a coff relocation table is done in the following stages:
    489 
    490 @itemize @bullet
    491 
    492 @item
    493 Read the entire coff relocation table into memory.
    494 
    495 @item
    496 Process each relocation in turn; first swap it from the
    497 external to the internal form.
    498 
    499 @item
    500 Turn the symbol referenced in the relocation's symbol index
    501 into a pointer into the canonical symbol table.
    502 This table is the same as the one returned by a call to
    503 @code{bfd_canonicalize_symtab}. The back end will call that
    504 routine and save the result if a canonicalization hasn't been done.
    505 
    506 @item
    507 The reloc index is turned into a pointer to a howto
    508 structure, in a back end specific way. For instance, the 386
    509 uses the @code{r_type} to directly produce an index
    510 into a howto table vector.
    511 
    512 @item
    513 Note that @code{arelent.addend} for COFF is often not what
    514 most people understand as a relocation addend, but rather an
    515 adjustment to the relocation addend stored in section contents
    516 of relocatable object files.  The value found in section
    517 contents may also be confusing, depending on both symbol value
    518 and addend somewhat similar to the field value for a
    519 final-linked object.  See @code{CALC_ADDEND}.
    520 @end itemize
    521 
    522