Tuesday, September 23, 2014

Don't cross the streams...

Just re-learnt a lesson about stream extraction operators in C++. I had written a simple program that was "skipping bytes" and giving me grief. It took me longer than I would like to understand why this was happening. I should've known better :)
Here's the hexdump of the file I was trying to read: # hexdump -C data

00000000 01 34 07 02 00 2a 02 46 e9 37 66 00 00 e5 58 07 |.4...*.F.7f...X.| 00000010 02 1d 89 9c 13 07 02 1d 65 a2 80 0c 0e 06 07 02 |........e.......| 00000020 1d 65 a2 80 0c 0e 07 de 05 1e 10 00 00 00 07 de |.e..............|

And here's the code that was trying to read it:

#include <iostream> #include <fstream> #define STRING(x) #x #define WRITE_TO_STREAM(os, data) do { \ os << STRING(data) << "=>" << (data) << "," << std::endl; \ } while(0) #define READ_FROM_STREAM(is, data) do {\ data = 0;\ auto before = is.tellg();\ for(auto s = 0; s < sizeof(data); ++ s) {\ uint8_t byte = 0;\ is >> byte;\ data = ((data << 8) | byte);\ }\ std::cout << "read " << std::dec << data << " [" << std::hex << std::showbase << data << "]" << std::endl; \ } while(0) struct DataStruct { uint8_t first = 0; uint32_t second = 0; uint16_t third = 0; uint32_t fourth = 0; uint32_t fifth = 0; uint16_t sixth = 0; uint32_t seventh = 0; uint16_t eighth = 0; uint32_t ninth = 0; uint8_t tenth = 0; uint8_t eleventh = 0; uint8_t twelfth = 0; }; std::ostream& operator << (std::ostream& out, const DataStruct &d) { WRITE_TO_STREAM(out, uint16_t(d.first)); WRITE_TO_STREAM(out, d.second); WRITE_TO_STREAM(out, uint16_t(d.third)); WRITE_TO_STREAM(out, d.fourth); WRITE_TO_STREAM(out, d.fifth); WRITE_TO_STREAM(out, d.sixth); WRITE_TO_STREAM(out, d.seventh); WRITE_TO_STREAM(out, d.eighth); WRITE_TO_STREAM(out, d.ninth); WRITE_TO_STREAM(out, uint16_t(d.tenth)); WRITE_TO_STREAM(out, uint16_t(d.eleventh)); WRITE_TO_STREAM(out, uint16_t(d.twelfth)); return out; } std::istream& operator >> (std::istream& in, DataStruct &d) { READ_FROM_STREAM(in, d.first); READ_FROM_STREAM(in, d.second); READ_FROM_STREAM(in, d.third); READ_FROM_STREAM(in, d.fourth); READ_FROM_STREAM(in, d.fifth); READ_FROM_STREAM(in, d.sixth); READ_FROM_STREAM(in, d.seventh); READ_FROM_STREAM(in, d.eighth); READ_FROM_STREAM(in, d.ninth); READ_FROM_STREAM(in, d.tenth); READ_FROM_STREAM(in, d.eleventh); READ_FROM_STREAM(in, d.twelfth); return in; } void readDataFromFile(const std::string &fileName) { std::ifstream infile(fileName, std::ios::binary | std::ios::in); if(!infile) { std::cerr << "can't open file " << fileName << std::endl; return; } DataStruct d; infile >> d; std::cout << d; } int main(int argc, char** argv) { if(argc < 2) { std::cerr << "Usage: argv[0] binary-file" << std::endl; return -1; } readDataFromFile(argv[1]); return 0; }

The aim is simple enough: read members of a struct from a given binary file and then print out what was read to the console. For the most part, the program works fine, but every now and then on some file, a field would be skipped entirely (hexdump of file on which skipping happened is given above).
However, I observed a few bytes being "skipped". Particularly, instead of the expected:

uint16_t(d.first)=>0x1, d.second=>0x34070200, uint16_t(d.third)=>0x2a02, d.fourth=>0x46e93766, d.fifth=>0xe558, d.sixth=>0x702, d.seventh=>0x1d899c13, d.eighth=>0x702, d.ninth=>0x1d65a280,

uint16_t(d.tenth)=>0xc, uint16_t(d.eleventh)=>0xe, uint16_t(d.twelfth)=>0x6,

I got:

uint16_t(d.first)=>0x1, d.second=>0x34070200, uint16_t(d.third)=>0x2a02, d.fourth=>0x46e93766, d.fifth=>0xe558, d.sixth=>0x702, d.seventh=>0x1d899c13, d.eighth=>0x702, d.ninth=>0x1d65a280,

uint16_t(d.tenth)=>0xe, uint16_t(d.eleventh)=>0x6, uint16_t(d.twelfth)=>0x7,

Since I was using std::ios::binary flag, this really confused me for a bit. I kept staring at it till it finally (re) dawned on me that operator >> does formatted io.

At that point, the "fix" became trivial:

infile >> std::noskipws >> d;

No comments: