1 @section coff backends 2 BFD supports a number of different flavours of coff format. 3 The major differences between formats are the sizes and 4 alignments of fields in structures on disk, and the occasional 5 extra field. 6 7 Coff in all its varieties is implemented with a few common 8 files and a number of implementation specific files. For 9 example, the i386 coff format is implemented in the file 10 @file{coff-i386.c}. This file @code{#include}s 11 @file{coff/i386.h} which defines the external structure of the 12 coff format for the i386, and @file{coff/internal.h} which 13 defines the internal structure. @file{coff-i386.c} also 14 defines the relocations used by the i386 coff format 15 @xref{Relocations}. 16 17 @subsection Porting to a new version of coff 18 The recommended method is to select from the existing 19 implementations the version of coff which is most like the one 20 you want to use. For example, we'll say that i386 coff is 21 the one you select, and that your coff flavour is called foo. 22 Copy @file{i386coff.c} to @file{foocoff.c}, copy 23 @file{../include/coff/i386.h} to @file{../include/coff/foo.h}, 24 and add the lines to @file{targets.c} and @file{Makefile.in} 25 so that your new back end is used. Alter the shapes of the 26 structures in @file{../include/coff/foo.h} so that they match 27 what you need. You will probably also have to add 28 @code{#ifdef}s to the code in @file{coff/internal.h} and 29 @file{coffcode.h} if your version of coff is too wild. 30 31 You can verify that your new BFD backend works quite simply by 32 building @file{objdump} from the @file{binutils} directory, 33 and making sure that its version of what's going on and your 34 host system's idea (assuming it has the pretty standard coff 35 dump utility, usually called @code{att-dump} or just 36 @code{dump}) are the same. Then clean up your code, and send 37 what you've done to Cygnus. Then your stuff will be in the 38 next release, and you won't have to keep integrating it. 39 40 @subsection How the coff backend works 41 42 43 @subsubsection File layout 44 The Coff backend is split into generic routines that are 45 applicable to any Coff target and routines that are specific 46 to a particular target. The target-specific routines are 47 further split into ones which are basically the same for all 48 Coff targets except that they use the external symbol format 49 or use different values for certain constants. 50 51 The generic routines are in @file{coffgen.c}. These routines 52 work for any Coff target. They use some hooks into the target 53 specific code; the hooks are in a @code{bfd_coff_backend_data} 54 structure, one of which exists for each target. 55 56 The essentially similar target-specific routines are in 57 @file{coffcode.h}. This header file includes executable C code. 58 The various Coff targets first include the appropriate Coff 59 header file, make any special defines that are needed, and 60 then include @file{coffcode.h}. 61 62 Some of the Coff targets then also have additional routines in 63 the target source file itself. 64 65 @subsubsection Coff long section names 66 In the standard Coff object format, section names are limited to 67 the eight bytes available in the @code{s_name} field of the 68 @code{SCNHDR} section header structure. The format requires the 69 field to be NUL-padded, but not necessarily NUL-terminated, so 70 the longest section names permitted are a full eight characters. 71 72 The Microsoft PE variants of the Coff object file format add 73 an extension to support the use of long section names. This 74 extension is defined in section 4 of the Microsoft PE/COFF 75 specification (rev 8.1). If a section name is too long to fit 76 into the section header's @code{s_name} field, it is instead 77 placed into the string table, and the @code{s_name} field is 78 filled with a slash ("/") followed by the ASCII decimal 79 representation of the offset of the full name relative to the 80 string table base. 81 82 Note that this implies that the extension can only be used in object 83 files, as executables do not contain a string table. The standard 84 specifies that long section names from objects emitted into executable 85 images are to be truncated. 86 87 However, as a GNU extension, BFD can generate executable images 88 that contain a string table and long section names. This 89 would appear to be technically valid, as the standard only says 90 that Coff debugging information is deprecated, not forbidden, 91 and in practice it works, although some tools that parse PE files 92 expecting the MS standard format may become confused; @file{PEview} is 93 one known example. 94 95 The functionality is supported in BFD by code implemented under 96 the control of the macro @code{COFF_LONG_SECTION_NAMES}. If not 97 defined, the format does not support long section names in any way. 98 If defined, it is used to initialise a flag, 99 @code{_bfd_coff_long_section_names}, and a hook function pointer, 100 @code{_bfd_coff_set_long_section_names}, in the Coff backend data 101 structure. The flag controls the generation of long section names 102 in output BFDs at runtime; if it is false, as it will be by default 103 when generating an executable image, long section names are truncated; 104 if true, the long section names extension is employed. The hook 105 points to a function that allows the value of a copy of the flag 106 in coff object tdata to be altered at runtime, on formats that 107 support long section names at all; on other formats it points 108 to a stub that returns an error indication. 109 110 With input BFDs, the flag is set according to whether any long section 111 names are detected while reading the section headers. For a completely 112 new BFD, the flag is set to the default for the target format. This 113 information can be used by a client of the BFD library when deciding 114 what output format to generate, and means that a BFD that is opened 115 for read and subsequently converted to a writeable BFD and modified 116 in-place will retain whatever format it had on input. 117 118 If @code{COFF_LONG_SECTION_NAMES} is simply defined (blank), or is 119 defined to the value "1", then long section names are enabled by 120 default; if it is defined to the value zero, they are disabled by 121 default (but still accepted in input BFDs). The header @file{coffcode.h} 122 defines a macro, @code{COFF_DEFAULT_LONG_SECTION_NAMES}, which is 123 used in the backends to initialise the backend data structure fields 124 appropriately; see the comments for further detail. 125 126 @subsubsection Bit twiddling 127 Each flavour of coff supported in BFD has its own header file 128 describing the external layout of the structures. There is also 129 an internal description of the coff layout, in 130 @file{coff/internal.h}. A major function of the 131 coff backend is swapping the bytes and twiddling the bits to 132 translate the external form of the structures into the normal 133 internal form. This is all performed in the 134 @code{bfd_swap}_@i{thing}_@i{direction} routines. Some 135 elements are different sizes between different versions of 136 coff; it is the duty of the coff version specific include file 137 to override the definitions of various packing routines in 138 @file{coffcode.h}. E.g., the size of line number entry in coff is 139 sometimes 16 bits, and sometimes 32 bits. @code{#define}ing 140 @code{PUT_LNSZ_LNNO} and @code{GET_LNSZ_LNNO} will select the 141 correct one. No doubt, some day someone will find a version of 142 coff which has a varying field size not catered to at the 143 moment. To port BFD, that person will have to add more @code{#defines}. 144 Three of the bit twiddling routines are exported to 145 @code{gdb}; @code{coff_swap_aux_in}, @code{coff_swap_sym_in} 146 and @code{coff_swap_lineno_in}. @code{GDB} reads the symbol 147 table on its own, but uses BFD to fix things up. More of the 148 bit twiddlers are exported for @code{gas}; 149 @code{coff_swap_aux_out}, @code{coff_swap_sym_out}, 150 @code{coff_swap_lineno_out}, @code{coff_swap_reloc_out}, 151 @code{coff_swap_filehdr_out}, @code{coff_swap_aouthdr_out}, 152 @code{coff_swap_scnhdr_out}. @code{Gas} currently keeps track 153 of all the symbol table and reloc drudgery itself, thereby 154 saving the internal BFD overhead, but uses BFD to swap things 155 on the way out, making cross ports much safer. Doing so also 156 allows BFD (and thus the linker) to use the same header files 157 as @code{gas}, which makes one avenue to disaster disappear. 158 159 @subsubsection Symbol reading 160 The simple canonical form for symbols used by BFD is not rich 161 enough to keep all the information available in a coff symbol 162 table. The back end gets around this problem by keeping the original 163 symbol table around, "behind the scenes". 164 165 When a symbol table is requested (through a call to 166 @code{bfd_canonicalize_symtab}), a request gets through to 167 @code{coff_get_normalized_symtab}. This reads the symbol table from 168 the coff file and swaps all the structures inside into the 169 internal form. It also fixes up all the pointers in the table 170 (represented in the file by offsets from the first symbol in 171 the table) into physical pointers to elements in the new 172 internal table. This involves some work since the meanings of 173 fields change depending upon context: a field that is a 174 pointer to another structure in the symbol table at one moment 175 may be the size in bytes of a structure at the next. Another 176 pass is made over the table. All symbols which mark file names 177 (@code{C_FILE} symbols) are modified so that the internal 178 string points to the value in the auxent (the real filename) 179 rather than the normal text associated with the symbol 180 (@code{".file"}). 181 182 At this time the symbol names are moved around. Coff stores 183 all symbols less than nine characters long physically 184 within the symbol table; longer strings are kept at the end of 185 the file in the string table. This pass moves all strings 186 into memory and replaces them with pointers to the strings. 187 188 The symbol table is massaged once again, this time to create 189 the canonical table used by the BFD application. Each symbol 190 is inspected in turn, and a decision made (using the 191 @code{sclass} field) about the various flags to set in the 192 @code{asymbol}. @xref{Symbols}. The generated canonical table 193 shares strings with the hidden internal symbol table. 194 195 Any linenumbers are read from the coff file too, and attached 196 to the symbols which own the functions the linenumbers belong to. 197 198 @subsubsection Symbol writing 199 Writing a symbol to a coff file which didn't come from a coff 200 file will lose any debugging information. The @code{asymbol} 201 structure remembers the BFD from which the symbol was taken, and on 202 output the back end makes sure that the same destination target as 203 source target is present. 204 205 When the symbols have come from a coff file then all the 206 debugging information is preserved. 207 208 Symbol tables are provided for writing to the back end in a 209 vector of pointers to pointers. This allows applications like 210 the linker to accumulate and output large symbol tables 211 without having to do too much byte copying. 212 213 This function runs through the provided symbol table and 214 patches each symbol marked as a file place holder 215 (@code{C_FILE}) to point to the next file place holder in the 216 list. It also marks each @code{offset} field in the list with 217 the offset from the first symbol of the current symbol. 218 219 Another function of this procedure is to turn the canonical 220 value form of BFD into the form used by coff. Internally, BFD 221 expects symbol values to be offsets from a section base; so a 222 symbol physically at 0x120, but in a section starting at 223 0x100, would have the value 0x20. Coff expects symbols to 224 contain their final value, so symbols have their values 225 changed at this point to reflect their sum with their owning 226 section. This transformation uses the 227 @code{output_section} field of the @code{asymbol}'s 228 @code{asection} @xref{Sections}. 229 230 @itemize @bullet 231 232 @item 233 @code{coff_mangle_symbols} 234 @end itemize 235 This routine runs though the provided symbol table and uses 236 the offsets generated by the previous pass and the pointers 237 generated when the symbol table was read in to create the 238 structured hierarchy required by coff. It changes each pointer 239 to a symbol into the index into the symbol table of the asymbol. 240 241 @itemize @bullet 242 243 @item 244 @code{coff_write_symbols} 245 @end itemize 246 This routine runs through the symbol table and patches up the 247 symbols from their internal form into the coff way, calls the 248 bit twiddlers, and writes out the table to the file. 249 250 @findex coff_symbol_type 251 @subsubsection @code{coff_symbol_type} 252 The hidden information for an @code{asymbol} is described in a 253 @code{combined_entry_type}: 254 255 256 @example 257 typedef struct coff_ptr_struct 258 @{ 259 /* Remembers the offset from the first symbol in the file for 260 this symbol. Generated by coff_renumber_symbols. */ 261 unsigned int offset; 262 263 /* Selects between the elements of the union below. */ 264 unsigned int is_sym : 1; 265 266 /* Selects between the elements of the x_sym.x_tagndx union. If set, 267 p is valid and the field will be renumbered. */ 268 unsigned int fix_tag : 1; 269 270 /* Selects between the elements of the x_sym.x_fcnary.x_fcn.x_endndx 271 union. If set, p is valid and the field will be renumbered. */ 272 unsigned int fix_end : 1; 273 274 /* Selects between the elements of the x_csect.x_scnlen union. If set, 275 p is valid and the field will be renumbered. */ 276 unsigned int fix_scnlen : 1; 277 278 /* If set, u.syment.n_value contains a pointer to a symbol. The final 279 value will be the offset field. Used for XCOFF C_BSTAT symbols. */ 280 unsigned int fix_value : 1; 281 282 /* If set, u.syment.n_value is an index into the line number entries. 283 Used for XCOFF C_BINCL/C_EINCL symbols. */ 284 unsigned int fix_line : 1; 285 286 /* The container for the symbol structure as read and translated 287 from the file. */ 288 union 289 @{ 290 union internal_auxent auxent; 291 struct internal_syment syment; 292 @} u; 293 294 /* An extra pointer which can used by format based on COFF (like XCOFF) 295 to provide extra information to their backend. */ 296 void *extrap; 297 @} combined_entry_type; 298 299 /* Each canonical asymbol really looks like this: */ 300 301 typedef struct coff_symbol_struct 302 @{ 303 /* The actual symbol which the rest of BFD works with */ 304 asymbol symbol; 305 306 /* A pointer to the hidden information for this symbol */ 307 combined_entry_type *native; 308 309 /* A pointer to the linenumber information for this symbol */ 310 struct lineno_cache_entry *lineno; 311 312 /* Have the line numbers been relocated yet ? */ 313 bool done_lineno; 314 @} coff_symbol_type; 315 316 @end example 317 @findex bfd_coff_backend_data 318 @subsubsection @code{bfd_coff_backend_data} 319 320 @example 321 typedef struct 322 @{ 323 void (*_bfd_coff_swap_aux_in) 324 (bfd *, void *, int, int, int, int, void *); 325 326 void (*_bfd_coff_swap_sym_in) 327 (bfd *, void *, void *); 328 329 void (*_bfd_coff_swap_lineno_in) 330 (bfd *, void *, void *); 331 332 unsigned int (*_bfd_coff_swap_aux_out) 333 (bfd *, void *, int, int, int, int, void *); 334 335 unsigned int (*_bfd_coff_swap_sym_out) 336 (bfd *, void *, void *); 337 338 unsigned int (*_bfd_coff_swap_lineno_out) 339 (bfd *, void *, void *); 340 341 unsigned int (*_bfd_coff_swap_reloc_out) 342 (bfd *, void *, void *); 343 344 unsigned int (*_bfd_coff_swap_filehdr_out) 345 (bfd *, void *, void *); 346 347 unsigned int (*_bfd_coff_swap_aouthdr_out) 348 (bfd *, void *, void *); 349 350 unsigned int (*_bfd_coff_swap_scnhdr_out) 351 (bfd *, void *, void *); 352 353 unsigned int _bfd_filhsz; 354 unsigned int _bfd_aoutsz; 355 unsigned int _bfd_scnhsz; 356 unsigned int _bfd_symesz; 357 unsigned int _bfd_auxesz; 358 unsigned int _bfd_relsz; 359 unsigned int _bfd_linesz; 360 unsigned int _bfd_filnmlen; 361 bool _bfd_coff_long_filenames; 362 363 bool _bfd_coff_long_section_names; 364 bool (*_bfd_coff_set_long_section_names) 365 (bfd *, int); 366 367 unsigned int _bfd_coff_default_section_alignment_power; 368 bool _bfd_coff_force_symnames_in_strings; 369 unsigned int _bfd_coff_debug_string_prefix_length; 370 unsigned int _bfd_coff_max_nscns; 371 372 void (*_bfd_coff_swap_filehdr_in) 373 (bfd *, void *, void *); 374 375 void (*_bfd_coff_swap_aouthdr_in) 376 (bfd *, void *, void *); 377 378 void (*_bfd_coff_swap_scnhdr_in) 379 (bfd *, void *, void *); 380 381 void (*_bfd_coff_swap_reloc_in) 382 (bfd *abfd, void *, void *); 383 384 bool (*_bfd_coff_bad_format_hook) 385 (bfd *, void *); 386 387 bool (*_bfd_coff_set_arch_mach_hook) 388 (bfd *, void *); 389 390 void * (*_bfd_coff_mkobject_hook) 391 (bfd *, void *, void *); 392 393 bool (*_bfd_styp_to_sec_flags_hook) 394 (bfd *, void *, const char *, asection *, flagword *); 395 396 void (*_bfd_set_alignment_hook) 397 (bfd *, asection *, void *); 398 399 bool (*_bfd_coff_slurp_symbol_table) 400 (bfd *); 401 402 bool (*_bfd_coff_symname_in_debug) 403 (bfd *, struct internal_syment *); 404 405 bool (*_bfd_coff_pointerize_aux_hook) 406 (bfd *, combined_entry_type *, combined_entry_type *, 407 unsigned int, combined_entry_type *); 408 409 bool (*_bfd_coff_print_aux) 410 (bfd *, FILE *, combined_entry_type *, combined_entry_type *, 411 combined_entry_type *, unsigned int); 412 413 bool (*_bfd_coff_reloc16_extra_cases) 414 (bfd *, struct bfd_link_info *, struct bfd_link_order *, arelent *, 415 bfd_byte *, size_t *, size_t *); 416 417 int (*_bfd_coff_reloc16_estimate) 418 (bfd *, asection *, arelent *, unsigned int, 419 struct bfd_link_info *); 420 421 enum coff_symbol_classification (*_bfd_coff_classify_symbol) 422 (bfd *, struct internal_syment *); 423 424 bool (*_bfd_coff_compute_section_file_positions) 425 (bfd *); 426 427 bool (*_bfd_coff_start_final_link) 428 (bfd *, struct bfd_link_info *); 429 430 bool (*_bfd_coff_relocate_section) 431 (bfd *, struct bfd_link_info *, bfd *, asection *, bfd_byte *, 432 struct internal_reloc *, struct internal_syment *, asection **); 433 434 reloc_howto_type *(*_bfd_coff_rtype_to_howto) 435 (bfd *, asection *, struct internal_reloc *, 436 struct coff_link_hash_entry *, struct internal_syment *, bfd_vma *); 437 438 bool (*_bfd_coff_adjust_symndx) 439 (bfd *, struct bfd_link_info *, bfd *, asection *, 440 struct internal_reloc *, bool *); 441 442 bool (*_bfd_coff_link_output_has_begun) 443 (bfd *, struct coff_final_link_info *); 444 445 bool (*_bfd_coff_final_link_postscript) 446 (bfd *, struct coff_final_link_info *); 447 448 bool (*_bfd_coff_print_pdata) 449 (bfd *, void *); 450 451 @} bfd_coff_backend_data; 452 453 @end example 454 @subsubsection Writing relocations 455 To write relocations, the back end steps though the 456 canonical relocation table and create an 457 @code{internal_reloc}. The symbol index to use is removed from 458 the @code{offset} field in the symbol table supplied. The 459 address comes directly from the sum of the section base 460 address and the relocation offset; the type is dug directly 461 from the howto field. Then the @code{internal_reloc} is 462 swapped into the shape of an @code{external_reloc} and written 463 out to disk. 464 465 @subsubsection Reading linenumbers 466 Creating the linenumber table is done by reading in the entire 467 coff linenumber table, and creating another table for internal use. 468 469 A coff linenumber table is structured so that each function 470 is marked as having a line number of 0. Each line within the 471 function is an offset from the first line in the function. The 472 base of the line number information for the table is stored in 473 the symbol associated with the function. 474 475 Note: The PE format uses line number 0 for a flag indicating a 476 new source file. 477 478 The information is copied from the external to the internal 479 table, and each symbol which marks a function is marked by 480 pointing its... 481 482 How does this work ? 483 484 @subsubsection Reading relocations 485 Coff relocations are easily transformed into the internal BFD form 486 (@code{arelent}). 487 488 Reading a coff relocation table is done in the following stages: 489 490 @itemize @bullet 491 492 @item 493 Read the entire coff relocation table into memory. 494 495 @item 496 Process each relocation in turn; first swap it from the 497 external to the internal form. 498 499 @item 500 Turn the symbol referenced in the relocation's symbol index 501 into a pointer into the canonical symbol table. 502 This table is the same as the one returned by a call to 503 @code{bfd_canonicalize_symtab}. The back end will call that 504 routine and save the result if a canonicalization hasn't been done. 505 506 @item 507 The reloc index is turned into a pointer to a howto 508 structure, in a back end specific way. For instance, the 386 509 uses the @code{r_type} to directly produce an index 510 into a howto table vector. 511 512 @item 513 Note that @code{arelent.addend} for COFF is often not what 514 most people understand as a relocation addend, but rather an 515 adjustment to the relocation addend stored in section contents 516 of relocatable object files. The value found in section 517 contents may also be confusing, depending on both symbol value 518 and addend somewhat similar to the field value for a 519 final-linked object. See @code{CALC_ADDEND}. 520 @end itemize 521 522