Home | History | Annotate | Line # | Download | only in manual
      1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>File Based Streams</title><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><meta name="keywords" content="ISO C++, library" /><meta name="keywords" content="ISO C++, runtime, library" /><link rel="home" href="../index.html" title="The GNU C++ Library" /><link rel="up" href="io.html" title="Chapter13. Input and Output" /><link rel="prev" href="stringstreams.html" title="Memory Based Streams" /><link rel="next" href="io_and_c.html" title="Interacting with C" /></head><body><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">File Based Streams</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a></td><th width="60%" align="center">Chapter13.
      3   Input and Output
      4   
      5 </th><td width="20%" align="right"><a accesskey="n" href="io_and_c.html">Next</a></td></tr></table><hr /></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="std.io.filestreams"></a>File Based Streams</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.copying_a_file"></a>Copying a File</h3></div></div></div><p>
      6   </p><p>So you want to copy a file quickly and easily, and most important,
      7       completely portably.  And since this is C++, you have an open
      8       ifstream (call it IN) and an open ofstream (call it OUT):
      9    </p><pre class="programlisting">
     10    #include &lt;fstream&gt;
     11 
     12    std::ifstream  IN ("input_file");
     13    std::ofstream  OUT ("output_file"); </pre><p>Here's the easiest way to get it completely wrong:
     14    </p><pre class="programlisting">
     15    OUT &lt;&lt; IN;</pre><p>For those of you who don't already know why this doesn't work
     16       (probably from having done it before), I invite you to quickly
     17       create a simple text file called "input_file" containing
     18       the sentence
     19    </p><pre class="programlisting">
     20       The quick brown fox jumped over the lazy dog.</pre><p>surrounded by blank lines.  Code it up and try it.  The contents
     21       of "output_file" may surprise you.
     22    </p><p>Seriously, go do it.  Get surprised, then come back.  It's worth it.
     23    </p><p>The thing to remember is that the <code class="code">basic_[io]stream</code> classes
     24       handle formatting, nothing else.  In particular, they break up on
     25       whitespace.  The actual reading, writing, and storing of data is
     26       handled by the <code class="code">basic_streambuf</code> family.  Fortunately, the
     27       <code class="code">operator&lt;&lt;</code> is overloaded to take an ostream and
     28       a pointer-to-streambuf, in order to help with just this kind of
     29       "dump the data verbatim" situation.
     30    </p><p>Why a <span class="emphasis"><em>pointer</em></span> to streambuf and not just a streambuf?  Well,
     31       the [io]streams hold pointers (or references, depending on the
     32       implementation) to their buffers, not the actual
     33       buffers.  This allows polymorphic behavior on the chapter of the buffers
     34       as well as the streams themselves.  The pointer is easily retrieved
     35       using the <code class="code">rdbuf()</code> member function.  Therefore, the easiest
     36       way to copy the file is:
     37    </p><pre class="programlisting">
     38    OUT &lt;&lt; IN.rdbuf();</pre><p>So what <span class="emphasis"><em>was</em></span> happening with OUT&lt;&lt;IN?  Undefined
     39       behavior, since that particular &lt;&lt; isn't defined by the Standard.
     40       I have seen instances where it is implemented, but the character
     41       extraction process removes all the whitespace, leaving you with no
     42       blank lines and only "Thequickbrownfox...".  With
     43       libraries that do not define that operator, IN (or one of IN's
     44       member pointers) sometimes gets converted to a void*, and the output
     45       file then contains a perfect text representation of a hexadecimal
     46       address (quite a big surprise).  Others don't compile at all.
     47    </p><p>Also note that none of this is specific to o<span class="emphasis"><em>*f*</em></span>streams.
     48       The operators shown above are all defined in the parent
     49       basic_ostream class and are therefore available with all possible
     50       descendants.
     51    </p></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a id="std.io.filestreams.binary"></a>Binary Input and Output</h3></div></div></div><p>
     52     </p><p>The first and most important thing to remember about binary I/O is
     53       that opening a file with <code class="code">ios::binary</code> is not, repeat
     54       <span class="emphasis"><em>not</em></span>, the only thing you have to do.  It is not a silver
     55       bullet, and will not allow you to use the <code class="code">&lt;&lt;/&gt;&gt;</code>
     56       operators of the normal fstreams to do binary I/O.
     57    </p><p>Sorry.  Them's the breaks.
     58    </p><p>This isn't going to try and be a complete tutorial on reading and
     59       writing binary files (because "binary"
     60       covers a lot of ground), but we will try and clear
     61       up a couple of misconceptions and common errors.
     62    </p><p>First, <code class="code">ios::binary</code> has exactly one defined effect, no more
     63       and no less.  Normal text mode has to be concerned with the newline
     64       characters, and the runtime system will translate between (for
     65       example) '\n' and the appropriate end-of-line sequence (LF on Unix,
     66       CRLF on DOS, CR on Macintosh, etc).  (There are other things that
     67       normal mode does, but that's the most obvious.)  Opening a file in
     68       binary mode disables this conversion, so reading a CRLF sequence
     69       under Windows won't accidentally get mapped to a '\n' character, etc.
     70       Binary mode is not supposed to suddenly give you a bitstream, and
     71       if it is doing so in your program then you've discovered a bug in
     72       your vendor's compiler (or some other chapter of the C++ implementation,
     73       possibly the runtime system).
     74    </p><p>Second, using <code class="code">&lt;&lt;</code> to write and <code class="code">&gt;&gt;</code> to
     75       read isn't going to work with the standard file stream classes, even
     76       if you use <code class="code">skipws</code> during reading.  Why not?  Because
     77       ifstream and ofstream exist for the purpose of <span class="emphasis"><em>formatting</em></span>,
     78       not reading and writing.  Their job is to interpret the data into
     79       text characters, and that's exactly what you don't want to happen
     80       during binary I/O.
     81    </p><p>Third, using the <code class="code">get()</code> and <code class="code">put()/write()</code> member
     82       functions still aren't guaranteed to help you.  These are
     83       "unformatted" I/O functions, but still character-based.
     84       (This may or may not be what you want, see below.)
     85    </p><p>Notice how all the problems here are due to the inappropriate use
     86       of <span class="emphasis"><em>formatting</em></span> functions and classes to perform something
     87       which <span class="emphasis"><em>requires</em></span> that formatting not be done?  There are a
     88       seemingly infinite number of solutions, and a few are listed here:
     89    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><span class="quote"><span class="quote">Derive your own fstream-type classes and write your own
     90 	  &lt;&lt;/&gt;&gt; operators to do binary I/O on whatever data
     91 	  types you're using.</span></span>
     92 	</p><p>
     93 	  This is a Bad Thing, because while
     94 	  the compiler would probably be just fine with it, other humans
     95 	  are going to be confused.  The overloaded bitshift operators
     96 	  have a well-defined meaning (formatting), and this breaks it.
     97 	</p></li><li class="listitem"><p>
     98 	  <span class="quote"><span class="quote">Build the file structure in memory, then
     99 	  <code class="code">mmap()</code> the file and copy the
    100 	  structure.
    101 	</span></span>
    102 	</p><p>
    103 	  Well, this is easy to make work, and easy to break, and is
    104 	  pretty equivalent to using <code class="code">::read()</code> and
    105 	  <code class="code">::write()</code> directly, and makes no use of the
    106 	  iostream library at all...
    107 	  </p></li><li class="listitem"><p>
    108 	  <span class="quote"><span class="quote">Use streambufs, that's what they're there for.</span></span>
    109 	</p><p>
    110 	  While not trivial for the beginner, this is the best of all
    111 	  solutions.  The streambuf/filebuf layer is the layer that is
    112 	  responsible for actual I/O.  If you want to use the C++
    113 	  library for binary I/O, this is where you start.
    114 	</p></li></ul></div><p>How to go about using streambufs is a bit beyond the scope of this
    115       document (at least for now), but while streambufs go a long way,
    116       they still leave a couple of things up to you, the programmer.
    117       As an example, byte ordering is completely between you and the
    118       operating system, and you have to handle it yourself.
    119    </p><p>Deriving a streambuf or filebuf
    120       class from the standard ones, one that is specific to your data
    121       types (or an abstraction thereof) is probably a good idea, and
    122       lots of examples exist in journals and on Usenet.  Using the
    123       standard filebufs directly (either by declaring your own or by
    124       using the pointer returned from an fstream's <code class="code">rdbuf()</code>)
    125       is certainly feasible as well.
    126    </p><p>One area that causes problems is trying to do bit-by-bit operations
    127       with filebufs.  C++ is no different from C in this respect:  I/O
    128       must be done at the byte level.  If you're trying to read or write
    129       a few bits at a time, you're going about it the wrong way.  You
    130       must read/write an integral number of bytes and then process the
    131       bytes.  (For example, the streambuf functions take and return
    132       variables of type <code class="code">int_type</code>.)
    133    </p><p>Another area of problems is opening text files in binary mode.
    134       Generally, binary mode is intended for binary files, and opening
    135       text files in binary mode means that you now have to deal with all of
    136       those end-of-line and end-of-file problems that we mentioned before.
    137    </p><p>
    138       An instructive thread from comp.lang.c++.moderated delved off into
    139       this topic starting more or less at
    140       <a class="link" href="https://groups.google.com/forum/#!topic/comp.std.c++/D4e0q9eVSoc" target="_top">this post</a>
    141       and continuing to the end of the thread. (The subject heading is "binary iostreams" on both comp.std.c++
    142       and comp.lang.c++.moderated.) Take special note of the replies by James Kanze and Dietmar Khl.
    143    </p><p>Briefly, the problems of byte ordering and type sizes mean that
    144       the unformatted functions like <code class="code">ostream::put()</code> and
    145       <code class="code">istream::get()</code> cannot safely be used to communicate
    146       between arbitrary programs, or across a network, or from one
    147       invocation of a program to another invocation of the same program
    148       on a different platform, etc.
    149    </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="stringstreams.html">Prev</a></td><td width="20%" align="center"><a accesskey="u" href="io.html">Up</a></td><td width="40%" align="right"><a accesskey="n" href="io_and_c.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Memory Based Streams</td><td width="20%" align="center"><a accesskey="h" href="../index.html">Home</a></td><td width="40%" align="right" valign="top">Interacting with C</td></tr></table></div></body></html>