Home | History | Annotate | Line # | Download | only in ProfileData
      1 //===- SampleProfReader.h - Read LLVM sample profile data -------*- C++ -*-===//
      2 //
      3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
      4 // See https://llvm.org/LICENSE.txt for license information.
      5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
      6 //
      7 //===----------------------------------------------------------------------===//
      8 //
      9 // This file contains definitions needed for reading sample profiles.
     10 //
     11 // NOTE: If you are making changes to this file format, please remember
     12 //       to document them in the Clang documentation at
     13 //       tools/clang/docs/UsersManual.rst.
     14 //
     15 // Text format
     16 // -----------
     17 //
     18 // Sample profiles are written as ASCII text. The file is divided into
     19 // sections, which correspond to each of the functions executed at runtime.
     20 // Each section has the following format
     21 //
     22 //     function1:total_samples:total_head_samples
     23 //      offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ]
     24 //      offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ]
     25 //      ...
     26 //      offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
     27 //      offsetA[.discriminator]: fnA:num_of_total_samples
     28 //       offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ]
     29 //       ...
     30 //      !CFGChecksum: num
     31 //      !Attribute: flags
     32 //
     33 // This is a nested tree in which the indentation represents the nesting level
     34 // of the inline stack. There are no blank lines in the file. And the spacing
     35 // within a single line is fixed. Additional spaces will result in an error
     36 // while reading the file.
     37 //
     38 // Any line starting with the '#' character is completely ignored.
     39 //
     40 // Inlined calls are represented with indentation. The Inline stack is a
     41 // stack of source locations in which the top of the stack represents the
     42 // leaf function, and the bottom of the stack represents the actual
     43 // symbol to which the instruction belongs.
     44 //
     45 // Function names must be mangled in order for the profile loader to
     46 // match them in the current translation unit. The two numbers in the
     47 // function header specify how many total samples were accumulated in the
     48 // function (first number), and the total number of samples accumulated
     49 // in the prologue of the function (second number). This head sample
     50 // count provides an indicator of how frequently the function is invoked.
     51 //
     52 // There are three types of lines in the function body.
     53 //
     54 // * Sampled line represents the profile information of a source location.
     55 // * Callsite line represents the profile information of a callsite.
     56 // * Metadata line represents extra metadata of the function.
     57 //
     58 // Each sampled line may contain several items. Some are optional (marked
     59 // below):
     60 //
     61 // a. Source line offset. This number represents the line number
     62 //    in the function where the sample was collected. The line number is
     63 //    always relative to the line where symbol of the function is
     64 //    defined. So, if the function has its header at line 280, the offset
     65 //    13 is at line 293 in the file.
     66 //
     67 //    Note that this offset should never be a negative number. This could
     68 //    happen in cases like macros. The debug machinery will register the
     69 //    line number at the point of macro expansion. So, if the macro was
     70 //    expanded in a line before the start of the function, the profile
     71 //    converter should emit a 0 as the offset (this means that the optimizers
     72 //    will not be able to associate a meaningful weight to the instructions
     73 //    in the macro).
     74 //
     75 // b. [OPTIONAL] Discriminator. This is used if the sampled program
     76 //    was compiled with DWARF discriminator support
     77 //    (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).
     78 //    DWARF discriminators are unsigned integer values that allow the
     79 //    compiler to distinguish between multiple execution paths on the
     80 //    same source line location.
     81 //
     82 //    For example, consider the line of code ``if (cond) foo(); else bar();``.
     83 //    If the predicate ``cond`` is true 80% of the time, then the edge
     84 //    into function ``foo`` should be considered to be taken most of the
     85 //    time. But both calls to ``foo`` and ``bar`` are at the same source
     86 //    line, so a sample count at that line is not sufficient. The
     87 //    compiler needs to know which part of that line is taken more
     88 //    frequently.
     89 //
     90 //    This is what discriminators provide. In this case, the calls to
     91 //    ``foo`` and ``bar`` will be at the same line, but will have
     92 //    different discriminator values. This allows the compiler to correctly
     93 //    set edge weights into ``foo`` and ``bar``.
     94 //
     95 // c. Number of samples. This is an integer quantity representing the
     96 //    number of samples collected by the profiler at this source
     97 //    location.
     98 //
     99 // d. [OPTIONAL] Potential call targets and samples. If present, this
    100 //    line contains a call instruction. This models both direct and
    101 //    number of samples. For example,
    102 //
    103 //      130: 7  foo:3  bar:2  baz:7
    104 //
    105 //    The above means that at relative line offset 130 there is a call
    106 //    instruction that calls one of ``foo()``, ``bar()`` and ``baz()``,
    107 //    with ``baz()`` being the relatively more frequently called target.
    108 //
    109 // Each callsite line may contain several items. Some are optional.
    110 //
    111 // a. Source line offset. This number represents the line number of the
    112 //    callsite that is inlined in the profiled binary.
    113 //
    114 // b. [OPTIONAL] Discriminator. Same as the discriminator for sampled line.
    115 //
    116 // c. Number of samples. This is an integer quantity representing the
    117 //    total number of samples collected for the inlined instance at this
    118 //    callsite
    119 //
    120 // Metadata line can occur in lines with one indent only, containing extra
    121 // information for the top-level function. Furthermore, metadata can only
    122 // occur after all the body samples and callsite samples.
    123 // Each metadata line may contain a particular type of metadata, marked by
    124 // the starting characters annotated with !. We process each metadata line
    125 // independently, hence each metadata line has to form an independent piece
    126 // of information that does not require cross-line reference.
    127 // We support the following types of metadata:
    128 //
    129 // a. CFG Checksum (a.k.a. function hash):
    130 //   !CFGChecksum: 12345
    131 // b. CFG Checksum (see ContextAttributeMask):
    132 //   !Atribute: 1
    133 //
    134 //
    135 // Binary format
    136 // -------------
    137 //
    138 // This is a more compact encoding. Numbers are encoded as ULEB128 values
    139 // and all strings are encoded in a name table. The file is organized in
    140 // the following sections:
    141 //
    142 // MAGIC (uint64_t)
    143 //    File identifier computed by function SPMagic() (0x5350524f463432ff)
    144 //
    145 // VERSION (uint32_t)
    146 //    File format version number computed by SPVersion()
    147 //
    148 // SUMMARY
    149 //    TOTAL_COUNT (uint64_t)
    150 //        Total number of samples in the profile.
    151 //    MAX_COUNT (uint64_t)
    152 //        Maximum value of samples on a line.
    153 //    MAX_FUNCTION_COUNT (uint64_t)
    154 //        Maximum number of samples at function entry (head samples).
    155 //    NUM_COUNTS (uint64_t)
    156 //        Number of lines with samples.
    157 //    NUM_FUNCTIONS (uint64_t)
    158 //        Number of functions with samples.
    159 //    NUM_DETAILED_SUMMARY_ENTRIES (size_t)
    160 //        Number of entries in detailed summary
    161 //    DETAILED_SUMMARY
    162 //        A list of detailed summary entry. Each entry consists of
    163 //        CUTOFF (uint32_t)
    164 //            Required percentile of total sample count expressed as a fraction
    165 //            multiplied by 1000000.
    166 //        MIN_COUNT (uint64_t)
    167 //            The minimum number of samples required to reach the target
    168 //            CUTOFF.
    169 //        NUM_COUNTS (uint64_t)
    170 //            Number of samples to get to the desrired percentile.
    171 //
    172 // NAME TABLE
    173 //    SIZE (uint32_t)
    174 //        Number of entries in the name table.
    175 //    NAMES
    176 //        A NUL-separated list of SIZE strings.
    177 //
    178 // FUNCTION BODY (one for each uninlined function body present in the profile)
    179 //    HEAD_SAMPLES (uint64_t) [only for top-level functions]
    180 //        Total number of samples collected at the head (prologue) of the
    181 //        function.
    182 //        NOTE: This field should only be present for top-level functions
    183 //              (i.e., not inlined into any caller). Inlined function calls
    184 //              have no prologue, so they don't need this.
    185 //    NAME_IDX (uint32_t)
    186 //        Index into the name table indicating the function name.
    187 //    SAMPLES (uint64_t)
    188 //        Total number of samples collected in this function.
    189 //    NRECS (uint32_t)
    190 //        Total number of sampling records this function's profile.
    191 //    BODY RECORDS
    192 //        A list of NRECS entries. Each entry contains:
    193 //          OFFSET (uint32_t)
    194 //            Line offset from the start of the function.
    195 //          DISCRIMINATOR (uint32_t)
    196 //            Discriminator value (see description of discriminators
    197 //            in the text format documentation above).
    198 //          SAMPLES (uint64_t)
    199 //            Number of samples collected at this location.
    200 //          NUM_CALLS (uint32_t)
    201 //            Number of non-inlined function calls made at this location. In the
    202 //            case of direct calls, this number will always be 1. For indirect
    203 //            calls (virtual functions and function pointers) this will
    204 //            represent all the actual functions called at runtime.
    205 //          CALL_TARGETS
    206 //            A list of NUM_CALLS entries for each called function:
    207 //               NAME_IDX (uint32_t)
    208 //                  Index into the name table with the callee name.
    209 //               SAMPLES (uint64_t)
    210 //                  Number of samples collected at the call site.
    211 //    NUM_INLINED_FUNCTIONS (uint32_t)
    212 //      Number of callees inlined into this function.
    213 //    INLINED FUNCTION RECORDS
    214 //      A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
    215 //      callees.
    216 //        OFFSET (uint32_t)
    217 //          Line offset from the start of the function.
    218 //        DISCRIMINATOR (uint32_t)
    219 //          Discriminator value (see description of discriminators
    220 //          in the text format documentation above).
    221 //        FUNCTION BODY
    222 //          A FUNCTION BODY entry describing the inlined function.
    223 //===----------------------------------------------------------------------===//
    224 
    225 #ifndef LLVM_PROFILEDATA_SAMPLEPROFREADER_H
    226 #define LLVM_PROFILEDATA_SAMPLEPROFREADER_H
    227 
    228 #include "llvm/ADT/Optional.h"
    229 #include "llvm/ADT/SmallVector.h"
    230 #include "llvm/ADT/StringMap.h"
    231 #include "llvm/ADT/StringRef.h"
    232 #include "llvm/IR/DiagnosticInfo.h"
    233 #include "llvm/IR/Function.h"
    234 #include "llvm/IR/LLVMContext.h"
    235 #include "llvm/IR/ProfileSummary.h"
    236 #include "llvm/ProfileData/GCOV.h"
    237 #include "llvm/ProfileData/SampleProf.h"
    238 #include "llvm/Support/Debug.h"
    239 #include "llvm/Support/ErrorOr.h"
    240 #include "llvm/Support/MemoryBuffer.h"
    241 #include "llvm/Support/SymbolRemappingReader.h"
    242 #include <algorithm>
    243 #include <cstdint>
    244 #include <memory>
    245 #include <string>
    246 #include <system_error>
    247 #include <vector>
    248 
    249 namespace llvm {
    250 
    251 class raw_ostream;
    252 class Twine;
    253 
    254 namespace sampleprof {
    255 
    256 class SampleProfileReader;
    257 
    258 /// SampleProfileReaderItaniumRemapper remaps the profile data from a
    259 /// sample profile data reader, by applying a provided set of equivalences
    260 /// between components of the symbol names in the profile.
    261 class SampleProfileReaderItaniumRemapper {
    262 public:
    263   SampleProfileReaderItaniumRemapper(std::unique_ptr<MemoryBuffer> B,
    264                                      std::unique_ptr<SymbolRemappingReader> SRR,
    265                                      SampleProfileReader &R)
    266       : Buffer(std::move(B)), Remappings(std::move(SRR)), Reader(R) {
    267     assert(Remappings && "Remappings cannot be nullptr");
    268   }
    269 
    270   /// Create a remapper from the given remapping file. The remapper will
    271   /// be used for profile read in by Reader.
    272   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
    273   create(const std::string Filename, SampleProfileReader &Reader,
    274          LLVMContext &C);
    275 
    276   /// Create a remapper from the given Buffer. The remapper will
    277   /// be used for profile read in by Reader.
    278   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
    279   create(std::unique_ptr<MemoryBuffer> &B, SampleProfileReader &Reader,
    280          LLVMContext &C);
    281 
    282   /// Apply remappings to the profile read by Reader.
    283   void applyRemapping(LLVMContext &Ctx);
    284 
    285   bool hasApplied() { return RemappingApplied; }
    286 
    287   /// Insert function name into remapper.
    288   void insert(StringRef FunctionName) { Remappings->insert(FunctionName); }
    289 
    290   /// Query whether there is equivalent in the remapper which has been
    291   /// inserted.
    292   bool exist(StringRef FunctionName) {
    293     return Remappings->lookup(FunctionName);
    294   }
    295 
    296   /// Return the equivalent name in the profile for \p FunctionName if
    297   /// it exists.
    298   Optional<StringRef> lookUpNameInProfile(StringRef FunctionName);
    299 
    300 private:
    301   // The buffer holding the content read from remapping file.
    302   std::unique_ptr<MemoryBuffer> Buffer;
    303   std::unique_ptr<SymbolRemappingReader> Remappings;
    304   // Map remapping key to the name in the profile. By looking up the
    305   // key in the remapper, a given new name can be mapped to the
    306   // cannonical name using the NameMap.
    307   DenseMap<SymbolRemappingReader::Key, StringRef> NameMap;
    308   // The Reader the remapper is servicing.
    309   SampleProfileReader &Reader;
    310   // Indicate whether remapping has been applied to the profile read
    311   // by Reader -- by calling applyRemapping.
    312   bool RemappingApplied = false;
    313 };
    314 
    315 /// Sample-based profile reader.
    316 ///
    317 /// Each profile contains sample counts for all the functions
    318 /// executed. Inside each function, statements are annotated with the
    319 /// collected samples on all the instructions associated with that
    320 /// statement.
    321 ///
    322 /// For this to produce meaningful data, the program needs to be
    323 /// compiled with some debug information (at minimum, line numbers:
    324 /// -gline-tables-only). Otherwise, it will be impossible to match IR
    325 /// instructions to the line numbers collected by the profiler.
    326 ///
    327 /// From the profile file, we are interested in collecting the
    328 /// following information:
    329 ///
    330 /// * A list of functions included in the profile (mangled names).
    331 ///
    332 /// * For each function F:
    333 ///   1. The total number of samples collected in F.
    334 ///
    335 ///   2. The samples collected at each line in F. To provide some
    336 ///      protection against source code shuffling, line numbers should
    337 ///      be relative to the start of the function.
    338 ///
    339 /// The reader supports two file formats: text and binary. The text format
    340 /// is useful for debugging and testing, while the binary format is more
    341 /// compact and I/O efficient. They can both be used interchangeably.
    342 class SampleProfileReader {
    343 public:
    344   SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
    345                       SampleProfileFormat Format = SPF_None)
    346       : Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}
    347 
    348   virtual ~SampleProfileReader() = default;
    349 
    350   /// Read and validate the file header.
    351   virtual std::error_code readHeader() = 0;
    352 
    353   /// The interface to read sample profiles from the associated file.
    354   std::error_code read() {
    355     if (std::error_code EC = readImpl())
    356       return EC;
    357     if (Remapper)
    358       Remapper->applyRemapping(Ctx);
    359     FunctionSamples::UseMD5 = useMD5();
    360     return sampleprof_error::success;
    361   }
    362 
    363   /// The implementaion to read sample profiles from the associated file.
    364   virtual std::error_code readImpl() = 0;
    365 
    366   /// Print the profile for \p FName on stream \p OS.
    367   void dumpFunctionProfile(StringRef FName, raw_ostream &OS = dbgs());
    368 
    369   /// Collect functions with definitions in Module M. For reader which
    370   /// support loading function profiles on demand, return true when the
    371   /// reader has been given a module. Always return false for reader
    372   /// which doesn't support loading function profiles on demand.
    373   virtual bool collectFuncsFromModule() { return false; }
    374 
    375   /// Print all the profiles on stream \p OS.
    376   void dump(raw_ostream &OS = dbgs());
    377 
    378   /// Return the samples collected for function \p F.
    379   FunctionSamples *getSamplesFor(const Function &F) {
    380     // The function name may have been updated by adding suffix. Call
    381     // a helper to (optionally) strip off suffixes so that we can
    382     // match against the original function name in the profile.
    383     StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
    384     return getSamplesFor(CanonName);
    385   }
    386 
    387   /// Return the samples collected for function \p F, create empty
    388   /// FunctionSamples if it doesn't exist.
    389   FunctionSamples *getOrCreateSamplesFor(const Function &F) {
    390     std::string FGUID;
    391     StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
    392     CanonName = getRepInFormat(CanonName, useMD5(), FGUID);
    393     return &Profiles[CanonName];
    394   }
    395 
    396   /// Return the samples collected for function \p F.
    397   virtual FunctionSamples *getSamplesFor(StringRef Fname) {
    398     std::string FGUID;
    399     Fname = getRepInFormat(Fname, useMD5(), FGUID);
    400     auto It = Profiles.find(Fname);
    401     if (It != Profiles.end())
    402       return &It->second;
    403 
    404     if (Remapper) {
    405       if (auto NameInProfile = Remapper->lookUpNameInProfile(Fname)) {
    406         auto It = Profiles.find(*NameInProfile);
    407         if (It != Profiles.end())
    408           return &It->second;
    409       }
    410     }
    411     return nullptr;
    412   }
    413 
    414   /// Return all the profiles.
    415   StringMap<FunctionSamples> &getProfiles() { return Profiles; }
    416 
    417   /// Report a parse error message.
    418   void reportError(int64_t LineNumber, const Twine &Msg) const {
    419     Ctx.diagnose(DiagnosticInfoSampleProfile(Buffer->getBufferIdentifier(),
    420                                              LineNumber, Msg));
    421   }
    422 
    423   /// Create a sample profile reader appropriate to the file format.
    424   /// Create a remapper underlying if RemapFilename is not empty.
    425   static ErrorOr<std::unique_ptr<SampleProfileReader>>
    426   create(const std::string Filename, LLVMContext &C,
    427          const std::string RemapFilename = "");
    428 
    429   /// Create a sample profile reader from the supplied memory buffer.
    430   /// Create a remapper underlying if RemapFilename is not empty.
    431   static ErrorOr<std::unique_ptr<SampleProfileReader>>
    432   create(std::unique_ptr<MemoryBuffer> &B, LLVMContext &C,
    433          const std::string RemapFilename = "");
    434 
    435   /// Return the profile summary.
    436   ProfileSummary &getSummary() const { return *(Summary.get()); }
    437 
    438   MemoryBuffer *getBuffer() const { return Buffer.get(); }
    439 
    440   /// \brief Return the profile format.
    441   SampleProfileFormat getFormat() const { return Format; }
    442 
    443   /// Whether input profile is based on pseudo probes.
    444   bool profileIsProbeBased() const { return ProfileIsProbeBased; }
    445 
    446   /// Whether input profile is fully context-sensitive
    447   bool profileIsCS() const { return ProfileIsCS; }
    448 
    449   virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
    450     return nullptr;
    451   };
    452 
    453   /// It includes all the names that have samples either in outline instance
    454   /// or inline instance.
    455   virtual std::vector<StringRef> *getNameTable() { return nullptr; }
    456   virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };
    457 
    458   /// Return whether names in the profile are all MD5 numbers.
    459   virtual bool useMD5() { return false; }
    460 
    461   /// Don't read profile without context if the flag is set. This is only meaningful
    462   /// for ExtBinary format.
    463   virtual void setSkipFlatProf(bool Skip) {}
    464   /// Return whether any name in the profile contains ".__uniq." suffix.
    465   virtual bool hasUniqSuffix() { return false; }
    466 
    467   SampleProfileReaderItaniumRemapper *getRemapper() { return Remapper.get(); }
    468 
    469   void setModule(const Module *Mod) { M = Mod; }
    470 
    471 protected:
    472   /// Map every function to its associated profile.
    473   ///
    474   /// The profile of every function executed at runtime is collected
    475   /// in the structure FunctionSamples. This maps function objects
    476   /// to their corresponding profiles.
    477   StringMap<FunctionSamples> Profiles;
    478 
    479   /// LLVM context used to emit diagnostics.
    480   LLVMContext &Ctx;
    481 
    482   /// Memory buffer holding the profile file.
    483   std::unique_ptr<MemoryBuffer> Buffer;
    484 
    485   /// Profile summary information.
    486   std::unique_ptr<ProfileSummary> Summary;
    487 
    488   /// Take ownership of the summary of this reader.
    489   static std::unique_ptr<ProfileSummary>
    490   takeSummary(SampleProfileReader &Reader) {
    491     return std::move(Reader.Summary);
    492   }
    493 
    494   /// Compute summary for this profile.
    495   void computeSummary();
    496 
    497   std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;
    498 
    499   /// \brief Whether samples are collected based on pseudo probes.
    500   bool ProfileIsProbeBased = false;
    501 
    502   /// Whether function profiles are context-sensitive.
    503   bool ProfileIsCS = false;
    504 
    505   /// Number of context-sensitive profiles.
    506   uint32_t CSProfileCount = 0;
    507 
    508   /// \brief The format of sample.
    509   SampleProfileFormat Format = SPF_None;
    510 
    511   /// \brief The current module being compiled if SampleProfileReader
    512   /// is used by compiler. If SampleProfileReader is used by other
    513   /// tools which are not compiler, M is usually nullptr.
    514   const Module *M = nullptr;
    515 };
    516 
    517 class SampleProfileReaderText : public SampleProfileReader {
    518 public:
    519   SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
    520       : SampleProfileReader(std::move(B), C, SPF_Text) {}
    521 
    522   /// Read and validate the file header.
    523   std::error_code readHeader() override { return sampleprof_error::success; }
    524 
    525   /// Read sample profiles from the associated file.
    526   std::error_code readImpl() override;
    527 
    528   /// Return true if \p Buffer is in the format supported by this class.
    529   static bool hasFormat(const MemoryBuffer &Buffer);
    530 };
    531 
    532 class SampleProfileReaderBinary : public SampleProfileReader {
    533 public:
    534   SampleProfileReaderBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
    535                             SampleProfileFormat Format = SPF_None)
    536       : SampleProfileReader(std::move(B), C, Format) {}
    537 
    538   /// Read and validate the file header.
    539   virtual std::error_code readHeader() override;
    540 
    541   /// Read sample profiles from the associated file.
    542   std::error_code readImpl() override;
    543 
    544   /// It includes all the names that have samples either in outline instance
    545   /// or inline instance.
    546   virtual std::vector<StringRef> *getNameTable() override { return &NameTable; }
    547 
    548 protected:
    549   /// Read a numeric value of type T from the profile.
    550   ///
    551   /// If an error occurs during decoding, a diagnostic message is emitted and
    552   /// EC is set.
    553   ///
    554   /// \returns the read value.
    555   template <typename T> ErrorOr<T> readNumber();
    556 
    557   /// Read a numeric value of type T from the profile. The value is saved
    558   /// without encoded.
    559   template <typename T> ErrorOr<T> readUnencodedNumber();
    560 
    561   /// Read a string from the profile.
    562   ///
    563   /// If an error occurs during decoding, a diagnostic message is emitted and
    564   /// EC is set.
    565   ///
    566   /// \returns the read value.
    567   ErrorOr<StringRef> readString();
    568 
    569   /// Read the string index and check whether it overflows the table.
    570   template <typename T> inline ErrorOr<uint32_t> readStringIndex(T &Table);
    571 
    572   /// Return true if we've reached the end of file.
    573   bool at_eof() const { return Data >= End; }
    574 
    575   /// Read the next function profile instance.
    576   std::error_code readFuncProfile(const uint8_t *Start);
    577 
    578   /// Read the contents of the given profile instance.
    579   std::error_code readProfile(FunctionSamples &FProfile);
    580 
    581   /// Read the contents of Magic number and Version number.
    582   std::error_code readMagicIdent();
    583 
    584   /// Read profile summary.
    585   std::error_code readSummary();
    586 
    587   /// Read the whole name table.
    588   virtual std::error_code readNameTable();
    589 
    590   /// Points to the current location in the buffer.
    591   const uint8_t *Data = nullptr;
    592 
    593   /// Points to the end of the buffer.
    594   const uint8_t *End = nullptr;
    595 
    596   /// Function name table.
    597   std::vector<StringRef> NameTable;
    598 
    599   /// Read a string indirectly via the name table.
    600   virtual ErrorOr<StringRef> readStringFromTable();
    601 
    602 private:
    603   std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
    604   virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
    605 };
    606 
    607 class SampleProfileReaderRawBinary : public SampleProfileReaderBinary {
    608 private:
    609   virtual std::error_code verifySPMagic(uint64_t Magic) override;
    610 
    611 public:
    612   SampleProfileReaderRawBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
    613                                SampleProfileFormat Format = SPF_Binary)
    614       : SampleProfileReaderBinary(std::move(B), C, Format) {}
    615 
    616   /// \brief Return true if \p Buffer is in the format supported by this class.
    617   static bool hasFormat(const MemoryBuffer &Buffer);
    618 };
    619 
    620 /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase defines
    621 /// the basic structure of the extensible binary format.
    622 /// The format is organized in sections except the magic and version number
    623 /// at the beginning. There is a section table before all the sections, and
    624 /// each entry in the table describes the entry type, start, size and
    625 /// attributes. The format in each section is defined by the section itself.
    626 ///
    627 /// It is easy to add a new section while maintaining the backward
    628 /// compatibility of the profile. Nothing extra needs to be done. If we want
    629 /// to extend an existing section, like add cache misses information in
    630 /// addition to the sample count in the profile body, we can add a new section
    631 /// with the extension and retire the existing section, and we could choose
    632 /// to keep the parser of the old section if we want the reader to be able
    633 /// to read both new and old format profile.
    634 ///
    635 /// SampleProfileReaderExtBinary/SampleProfileWriterExtBinary define the
    636 /// commonly used sections of a profile in extensible binary format. It is
    637 /// possible to define other types of profile inherited from
    638 /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase.
    639 class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
    640 private:
    641   std::error_code decompressSection(const uint8_t *SecStart,
    642                                     const uint64_t SecSize,
    643                                     const uint8_t *&DecompressBuf,
    644                                     uint64_t &DecompressBufSize);
    645 
    646   BumpPtrAllocator Allocator;
    647 
    648 protected:
    649   std::vector<SecHdrTableEntry> SecHdrTable;
    650   std::error_code readSecHdrTableEntry(uint32_t Idx);
    651   std::error_code readSecHdrTable();
    652 
    653   std::error_code readFuncMetadata(bool ProfileHasAttribute);
    654   std::error_code readFuncOffsetTable();
    655   std::error_code readFuncProfiles();
    656   std::error_code readMD5NameTable();
    657   std::error_code readNameTableSec(bool IsMD5);
    658   std::error_code readProfileSymbolList();
    659 
    660   virtual std::error_code readHeader() override;
    661   virtual std::error_code verifySPMagic(uint64_t Magic) override = 0;
    662   virtual std::error_code readOneSection(const uint8_t *Start, uint64_t Size,
    663                                          const SecHdrTableEntry &Entry);
    664   // placeholder for subclasses to dispatch their own section readers.
    665   virtual std::error_code readCustomSection(const SecHdrTableEntry &Entry) = 0;
    666   virtual ErrorOr<StringRef> readStringFromTable() override;
    667 
    668   std::unique_ptr<ProfileSymbolList> ProfSymList;
    669 
    670   /// The table mapping from function name to the offset of its FunctionSample
    671   /// towards file start.
    672   DenseMap<StringRef, uint64_t> FuncOffsetTable;
    673   /// The set containing the functions to use when compiling a module.
    674   DenseSet<StringRef> FuncsToUse;
    675 
    676   /// Use fixed length MD5 instead of ULEB128 encoding so NameTable doesn't
    677   /// need to be read in up front and can be directly accessed using index.
    678   bool FixedLengthMD5 = false;
    679   /// The starting address of NameTable containing fixed length MD5.
    680   const uint8_t *MD5NameMemStart = nullptr;
    681 
    682   /// If MD5 is used in NameTable section, the section saves uint64_t data.
    683   /// The uint64_t data has to be converted to a string and then the string
    684   /// will be used to initialize StringRef in NameTable.
    685   /// Note NameTable contains StringRef so it needs another buffer to own
    686   /// the string data. MD5StringBuf serves as the string buffer that is
    687   /// referenced by NameTable (vector of StringRef). We make sure
    688   /// the lifetime of MD5StringBuf is not shorter than that of NameTable.
    689   std::unique_ptr<std::vector<std::string>> MD5StringBuf;
    690 
    691   /// If SkipFlatProf is true, skip the sections with
    692   /// SecFlagFlat flag.
    693   bool SkipFlatProf = false;
    694 
    695 public:
    696   SampleProfileReaderExtBinaryBase(std::unique_ptr<MemoryBuffer> B,
    697                                    LLVMContext &C, SampleProfileFormat Format)
    698       : SampleProfileReaderBinary(std::move(B), C, Format) {}
    699 
    700   /// Read sample profiles in extensible format from the associated file.
    701   std::error_code readImpl() override;
    702 
    703   /// Get the total size of all \p Type sections.
    704   uint64_t getSectionSize(SecType Type);
    705   /// Get the total size of header and all sections.
    706   uint64_t getFileSize();
    707   virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) override;
    708 
    709   /// Collect functions with definitions in Module M. Return true if
    710   /// the reader has been given a module.
    711   bool collectFuncsFromModule() override;
    712 
    713   /// Return whether names in the profile are all MD5 numbers.
    714   virtual bool useMD5() override { return MD5StringBuf.get(); }
    715 
    716   virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() override {
    717     return std::move(ProfSymList);
    718   };
    719 
    720   virtual void setSkipFlatProf(bool Skip) override { SkipFlatProf = Skip; }
    721 };
    722 
    723 class SampleProfileReaderExtBinary : public SampleProfileReaderExtBinaryBase {
    724 private:
    725   virtual std::error_code verifySPMagic(uint64_t Magic) override;
    726   virtual std::error_code
    727   readCustomSection(const SecHdrTableEntry &Entry) override {
    728     return sampleprof_error::success;
    729   };
    730 
    731 public:
    732   SampleProfileReaderExtBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
    733                                SampleProfileFormat Format = SPF_Ext_Binary)
    734       : SampleProfileReaderExtBinaryBase(std::move(B), C, Format) {}
    735 
    736   /// \brief Return true if \p Buffer is in the format supported by this class.
    737   static bool hasFormat(const MemoryBuffer &Buffer);
    738 };
    739 
    740 class SampleProfileReaderCompactBinary : public SampleProfileReaderBinary {
    741 private:
    742   /// Function name table.
    743   std::vector<std::string> NameTable;
    744   /// The table mapping from function name to the offset of its FunctionSample
    745   /// towards file start.
    746   DenseMap<StringRef, uint64_t> FuncOffsetTable;
    747   /// The set containing the functions to use when compiling a module.
    748   DenseSet<StringRef> FuncsToUse;
    749   virtual std::error_code verifySPMagic(uint64_t Magic) override;
    750   virtual std::error_code readNameTable() override;
    751   /// Read a string indirectly via the name table.
    752   virtual ErrorOr<StringRef> readStringFromTable() override;
    753   virtual std::error_code readHeader() override;
    754   std::error_code readFuncOffsetTable();
    755 
    756 public:
    757   SampleProfileReaderCompactBinary(std::unique_ptr<MemoryBuffer> B,
    758                                    LLVMContext &C)
    759       : SampleProfileReaderBinary(std::move(B), C, SPF_Compact_Binary) {}
    760 
    761   /// \brief Return true if \p Buffer is in the format supported by this class.
    762   static bool hasFormat(const MemoryBuffer &Buffer);
    763 
    764   /// Read samples only for functions to use.
    765   std::error_code readImpl() override;
    766 
    767   /// Collect functions with definitions in Module M. Return true if
    768   /// the reader has been given a module.
    769   bool collectFuncsFromModule() override;
    770 
    771   /// Return whether names in the profile are all MD5 numbers.
    772   virtual bool useMD5() override { return true; }
    773 };
    774 
    775 using InlineCallStack = SmallVector<FunctionSamples *, 10>;
    776 
    777 // Supported histogram types in GCC.  Currently, we only need support for
    778 // call target histograms.
    779 enum HistType {
    780   HIST_TYPE_INTERVAL,
    781   HIST_TYPE_POW2,
    782   HIST_TYPE_SINGLE_VALUE,
    783   HIST_TYPE_CONST_DELTA,
    784   HIST_TYPE_INDIR_CALL,
    785   HIST_TYPE_AVERAGE,
    786   HIST_TYPE_IOR,
    787   HIST_TYPE_INDIR_CALL_TOPN
    788 };
    789 
    790 class SampleProfileReaderGCC : public SampleProfileReader {
    791 public:
    792   SampleProfileReaderGCC(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
    793       : SampleProfileReader(std::move(B), C, SPF_GCC),
    794         GcovBuffer(Buffer.get()) {}
    795 
    796   /// Read and validate the file header.
    797   std::error_code readHeader() override;
    798 
    799   /// Read sample profiles from the associated file.
    800   std::error_code readImpl() override;
    801 
    802   /// Return true if \p Buffer is in the format supported by this class.
    803   static bool hasFormat(const MemoryBuffer &Buffer);
    804 
    805 protected:
    806   std::error_code readNameTable();
    807   std::error_code readOneFunctionProfile(const InlineCallStack &InlineStack,
    808                                          bool Update, uint32_t Offset);
    809   std::error_code readFunctionProfiles();
    810   std::error_code skipNextWord();
    811   template <typename T> ErrorOr<T> readNumber();
    812   ErrorOr<StringRef> readString();
    813 
    814   /// Read the section tag and check that it's the same as \p Expected.
    815   std::error_code readSectionTag(uint32_t Expected);
    816 
    817   /// GCOV buffer containing the profile.
    818   GCOVBuffer GcovBuffer;
    819 
    820   /// Function names in this profile.
    821   std::vector<std::string> Names;
    822 
    823   /// GCOV tags used to separate sections in the profile file.
    824   static const uint32_t GCOVTagAFDOFileNames = 0xaa000000;
    825   static const uint32_t GCOVTagAFDOFunction = 0xac000000;
    826 };
    827 
    828 } // end namespace sampleprof
    829 
    830 } // end namespace llvm
    831 
    832 #endif // LLVM_PROFILEDATA_SAMPLEPROFREADER_H
    833