Courses/CS 2124/Lab Manual/The Random Access File

From A-State Computer Science Wiki
Jump to: navigation, search

Introduction

A random access file may be thought of as an array in secondary storage; this storage is cheaper and larger but slower than primary storage. This lab illustrates in an object-oriented fashion how random access files are created and manipulated.

In this exercise, we will build a random access file representing the campsites in a recreational area that could be used as the basis for a booking system.

Defining a Record Structure

To facilitate easy random access, it is necessary that all of the “records” in the file are the same size. This will allow us to locate a specific record easily by performing some simple math.

We need a representation of a campsite with the following values: * site number (an integer will work for this) * site description (a string) * a flag indicating whether the site provides electric power or not * a per-night booking rate for the site (a floating point value will work for this)

We want to keep the record simple, so we will use a struct. Create a header file CampsiteRecord.h and add the following structure definition:

/**
 * A fixed-size record representing a campsite in a 
 * recreational area.
 */
struct CampsiteRecord {
    CampsiteRecord();
    CampsiteRecord(int number, const char description[], bool has_electric, double rate);
    static const int desc_size = 128;         /// max storage size of description
    int              number;                  /// site number
    char             description[desc_size];  /// a short description
    bool             has_electric;            /// is power available?
    double           rate;                    /// per-night rate
};

The non-default constructor should copy its parameters into the attributes of the object. Implement it now, being sure to copy the c-string description safely. Hint: Remember, you cannot use assignment to copy c-strings!. You might want to refer to your notes, book, or online documentation for a refresher on how to use strncpy from the <cstring> library.

Implement the constructor in a new implementation file CampsiteRecord.cpp.

Avoid information leakage!

There is a security concern that we must mention at this point. The random access file will be written in binary mode, meaning that all of the bytes contained within the record will be copied directly into the file. That would include any bytes that are not actually “in use” in the record — the ones at the end of the description c-string, and potentially “padding” bytes introduced between attributes in physical memory (this varies from system to system).

But, those “unused” bytes might actually contain information… Remember that main memory isn’t “cleared” when variables go out of scope — values often remain there until the memory is re-used. Writing from “unused” memory into a file leaks whatever information was in that unused memory. Imagine the following scenario, where the description is drastically shortened to it on the screen (it is not shown at its real 128 byte length, instead it is shown as 24 bytes).

The bytes marked n belong to number, d belong to description, e belong to has_electric, r belong to rate.

Memory before a record is created (i.e. “uninitialized”):

-------------------------------------
nnnndddddddddddddddddddddddderrrrrrrr
my password is "open sesame". Don't t
-------------------------------------

Memory after a record has been created (the ‘#’ represents the terminating ‘\0’ on the c-string):

-------------------------------------
nnnndddddddddddddddddddddddderrrrrrrr
4900tent site# "open sesame"0000-801@
-------------------------------------

Notice that the unused bytes after “tent site” still contain some of the sensitive information that was lurking in memory. If this information is written to the file, it could live indefinitely on secondary storage (and in backups).

To avoid this information leak, we will use a very low-level technique to overwrite all of the bytes in the physical object during initialization. The <cstring> library contains the function memset() that allows a specific char (byte) value to be written into every byte within a range. Add the following line of code as the first line of the constructor you just created:

memset(this, 0, sizeof(CampsiteRecord)); // overwrite all non-static memory in the record with zeros.

Now, in the situation above, the “after” memory footprint would look more like this (the ‘#’ represents the terminating ‘\0’ on the c-string):

-------------------------------------
nnnndddddddddddddddddddddddderrrrrrrr
4900tent site#000000000000000000-801@
-------------------------------------

Implement the default constructor now. The default constructor should also contain the call to memset() to zero the object’s physical storage. In theory, the attributes would need a default value now as well, but it turns out that “all bytes set to zero” provides a convenient initial state for all of the attributes: Zero values in all numeric types, false in the boolean value, and the “empty string” in the c-string. Warning: Do not use this method recklessly. If your object contained any attributes that were object types, their internal state would likely be corrupted by the memset()! In this case, all attributes were primitive or array types, so it was safe.

Now, add the following functions to the implementation file, and add appropriate prototypes for each to the CampsiteRecord.h header file. These will help with debugging.

/**
 * Read a site record from a delimited ASCII file, where
 * each field is separated by a '|' character.
 *
 * @param strm   input stream
 * @param site   site record to store resulting value into
 */
std::istream& from_ascii_file( std::istream& strm, CampsiteRecord& site ) {
    char delim;
    char buffer[6];  // used to catch "true" or "false"
    strm >> site.number >> delim;
    strm.getline( site.description, site.desc_size, '|' );
    strm.getline( buffer, 6, '|' );
    site.has_electric = tolower( buffer[0] ) == 't';  // true if the string started with 't'
    strm >> site.rate;
    return strm;
}

/**
 * stream insertion to print campsite records in a screen-friendly format.
 *
 * @param strm   output stream
 * @param site   site record to print
 */
std::ostream& operator<<( std::ostream& strm, const CampsiteRecord& site ) {
    strm << site.number      << ' ' << ( site.has_electric ? "[E]" : "[ ]" ) 
         << ' ' << site.description << ' '
         << std::fixed << std::setprecision( 2 ) << "\t($" << site.rate << ")"
         << std::resetiosflags( std::ios::fixed | std::ios::showpoint );
    return strm;
}

Add the following code to a main program in the file main.cpp to test the campsite record so far.

// Test default c-tor
CampsiteRecord site_rec1;   
cout << "Empty record: " << site_rec1 << "\n";

// Non-default c-tor
CampsiteRecord site_rec2{42, "Lakefront Cabin", true, 300.00};
cout << "Non-empty record: " << site_rec2 << "\n";

// Read a record from a file
std::ifstream fin{"sites.txt"};
from_ascii_file(fin, site_rec1);
fin.close();
cout << "First site from file: " << site_rec1 << "\n";

Execute the program to verify that it is working properly. The file sites.txt is available as a resource file attached to this assignment in CSCADE.

Ergonomic Wrapper Class for the CampsiteRecord

It would be nice to provide a little more encapsulation for the campsite information, and to provide a more “ergonomic” interface for programmer- interaction. This will allow us to “hide” details like the underlying c-string implementation of description and provide a more modern API.

The header file Campsite.h and implementation Campsite.cpp are provided in the resources section of this assignment. The class definition is shown below.

class Campsite {
public:
    Campsite( ) = default;
    Campsite( int number, std::string description, bool has_electric, double rate );
    Campsite( const CampsiteRecord& r );

    CampsiteRecord get_record() const;
    int            get_number( ) const;
    std::string    get_description( ) const;
    bool           has_electric( ) const;
    bool           get_rate( ) const;

    void set_number( int number );
    void set_description( std::string description );
    void set_has_electric( bool flag );
    void set_rate( double rate );

    std::ostream& write(std::ostream& strm) const;
    std::istream& from_ascii_file(std::istream& strm);
    
private:
    CampsiteRecord _rec;
};

std::ostream& operator<< (std::ostream& strm, const Campsite& site);

Copy these files into your project, then try to declare an instance of Campsite in main() and verify that everything compiles and links properly.

Campsite site1 {3, "Riverfront Condo", true, 450.00};
cout << site1 << '\n';


Labcheckpoint.png FOR IN-LAB CREDIT: Demonstrate your output and code so far to a lab assistant before moving on.


The Random Access File

The random access file of this project will be implemented with an object wrapper. We will present a programmer-friendly interface from the wrapper, so the user will be able to think in terms of Campsites, without worrying about CampsiteRecords. Internally, we will have to work with CampsiteRecord objects, since they will form the basic unit of storage in the binary file.

Create the following class definition in the file CampsiteDB.h.

class CampsiteDB {
public:
    CampsiteDB( std::string filename );

    int get_record_count( );
    int get_current_index( bool write=false );

    Campsite get_next_sequential( );
    Campsite get_at_index( int index );

    void write_next_sequential( const Campsite& site );
    void write_at_index( int index, const Campsite& site );

    void print_record( int index, std::ostream& strm = std::cout );
    void list_records( std::ostream& strm = std::cout );

    // This object is non-copyable
    CampsiteDB(const CampsiteDB&)            = delete;  
    CampsiteDB& operator=(const CampsiteDB&) = delete;  

private:
    // private methods:
    void _create_file( );
    bool _open_file( );
    // attributes
    std::string  _filename;
    std::fstream _file;
};

Note that the actual file stream attribute in the object, _file, is of type fstream, and not ifstream or ofstream. This means that each time the file is opened, a second parameter must be passed to fstream method open() to indicate whether the file will be used for input, output, or both.

As stated in the introduction, a random access file can be viewed as an array in secondary memory; it follows then that it is the constructor’s job to allocate memory for this array. This is accomplished by simply opening the file and writing blank records to it.

Here we introduce a new ios flag in addition to ios::in and ios::out for specifying the file mode. The flag ios::binary is used to indicate that we are reading and writing the contents of memory instead of text. In text mode, which is the default, it is necessary to treat certain characters such as the newline as special. This special treatment can cause problems when we want to output, for example, a byte containing the value 0xA (hexadecimal for a decimal 10), which just happens to be the same as the ASCII code for a newline. If we were in text mode writing a 0xA may cause something else to be written because different systems handle newlines in different ways. However, in binary mode 0xA is always 0xA. In other words, ios::binary tells C++ to write out exactly what we tell it to.

Implement the constructor for the campsite storage database. The constructor will always open the file (if possible). If the file doesn’t exist, it must be created. We will detect this by trying to open it the way we want first, then detecting a failure, trying to create, then trying again to open. If all of that fails (we can’t create the file), we have no choice but to throw an exception. The algorithm follows; make use of the _open_file() and _create_file() helper methods when you implement.

try to open the file -- if we fail to open the file:
    try to create the file
    try to open the file -- if we still fail,
        throw a std::runtime_error exception saying "Unable to create the database."

Now we must immediately implement the two private helper methods _open_file() and _create_file(). The operate on the _file attribute as follows:

  • _open_file() should open the file whose name is _filename with input, output, and binary mode flags. Then, use the stream seekp() method to seek the “put” position to the end of the file (seek to 0 bytes from std::ios::end). This will allow immediate appending of new values at the end, if desired. The result of testing whether the file is indeed open after this operation should be returned (true if the file is open, false otherwise).
  • _create_file() should open the file whose name is _filename in output mode (this will create the file, if you have permission). The file stream should then be immediately closed.

Create method stubs for all of the other methods in CampsiteDB so that you can compile. Add the following to your main program and check that it compiles.

CampsiteDB db{"campsites.db"};

Run the program and verify that the file “campsites.db” is being created. It is currently empty (size zero); you may delete it and re-test as needed.

Sequential Operations

It is sometimes convenient (even in a random access file) to perform sequential access (front-to-back). For example, this will allow us to easily populate the database initially. It will also allow us to list all of the values in the database (to perform an export, for example).

Complete the write_next_sequential() method so that it follows the following operation:

Get the record from the `site` parameter.
write the record into the `_file` stream using the `write()` method

To write to a binary file, you must use the write() method of the fstream object. This method expects the first argument to be a pointer of type char* to the first byte of the object being written, and the second argument is the size (number of bytes) to transfer.

The example below writes a floating-point value into a file stream example_strm. Use this example, your book, notes, or online documentation to guide you; implement the write_next_sequential() method now.

double value = 8.32;
example_strm.write(reinterpret_cast<char*>(&value), sizeof(double));

The sequential read operation get_next_sequential() is symmetric to the sequential write operation:

Create a CampsiteRecord to store the value from the file
read the record from the `_file` stream using the `read()` method
Create a Campsite using the record you read to initialize it, and return the Campsite object.

Implement your get_next_sequential() method now.

Computing the number of records

It will be helpful to know how many records the file contains. This can be computed by doing some simple byte-based math. The number of records in the file ({\textstyle C_r}) is given by:

{\textstyle C_r = S_f / S_r}

where {\textstyle S_f} is the size of the file (in bytes), and {\textstyle S_r} is the size of a single record, and the division operation is integer division.

You already know how to get the size of the records (sizeof()). To get the size of the file in bytes, the seekg() and tellg() methods can help. Here is the algorithm:

seek to the end of the file (seekg() to 0 bytes from ios::end)
use tellg() to get the current position, which is the size of the file since you are positioned at the end.

Quick Reference: Syntax of seekg() and seekp():

Syntax of seekg:
    (1) istream& seekg (streampos pos);
    (2) istream& seekg (streamoff distance_offset, ios_base::seekdir offset_mode);
Version (1) sets the current position to `pos` bytes from the beginning of the file. 
Version (2) moves the current position to `distance_offset` bytes from a starting location determined by the `offset_mode` flag.
            `offset_mode` must be one of:  
                `std::ios::beg` - offset is calculated relative to the beginning of the file
                `std::ios::cur` - offset is calculated relative to the current location in the file
                `std::ios::end` - offset is calculated relative to the end of the file

Syntax of seekp:
    (1) ostream& seekp (streampos pos);
    (2) ostream& seekp (streamoff distance_offset, ios_base::seekdir offset_mode);
Version (1) sets the current position to `pos` bytes from the beginning of the file. 
Version (2) moves the current position to `distance_offset` bytes from a starting location determined by the `offset_mode` flag.
            `offset_mode` must be one of:  
                `std::ios::beg` - offset is calculated relative to the beginning of the file
                `std::ios::cur` - offset is calculated relative to the current location in the file
                `std::ios::end` - offset is calculated relative to the end of the file
    
`streampos` and `streamoff` are both integer-compatible types.
`ios_base::seekdir` is an enumerated type to represent the seeking direction of a stream seeking operation.

See also http://en.cppreference.com/w/cpp/io/basic_istream/seekg, http://en.cppreference.com/w/cpp/io/basic_ostream/seekp

Implement the get_record_count() method now.

It is also helpful to ask “where am I?” in terms of the “index” of records in the file. The get_current_index() method does this. If the flag is set to true, the index is given with respect to the “put” (writing) marker, otherwise it is with respect to the “get” (reading) marker.

To determine the position in terms of record index ({\textstyle i}), calculate:

{\textstyle i = P_f / S_r}

Where {\textstyle P_f} is the current position in the file (either of the “get” or “put” marker, depending on the write flag true means the position of the “put” marker should be used, otherwise the “get” marker should be used.). Use tellg() or tellp() to find this position. {\textstyle S_r} is the size of a single record. The division is integer division.

Implement the get_current_index() method now.

Sequential Listing of Records

In order to “see” the file created by the constructor, the method list_records() will be developed to sequentially list the contents of the file. Implement this method now, following the algorithm shown below.

clear the state flags for the file stream (i.e. in case a previous operation hit eof)
seek the "get" marker to the beginning of the file (0 bytes from ios::beg)
while the file's state is "good":
    get the next Campsite using sequential access
    if the file's state is still "good"
        write the Campsite record to the stream parameter (using stream insertion)
    write a newline character to separate records
clear the state flags for the file stream (we hit eof, so reset the flags)
seek the "get" marker back to the beginning of the file

Add the following code to your main program to test your work so far:

cout << "File currently contains " << db.get_record_count() << " records.\n\n";

fin.open("sites.txt");
while(fin.good()){
    Campsite site;
    if(site.from_ascii_file(fin)){
        cout << "Adding " << site << "\n";
        db.write_next_sequential(site);
    }
}
cout << "\n\n";
fin.close();    

cout << "Read index is "  << db.get_current_index() << "\n";
cout << "Write index is " << db.get_current_index(true) << "\n";
cout << "File now contains " << db.get_record_count() << " records.\n\n";

db.list_records();
cout << "\n";

You can verify that all 10 records are shown. Running the program again will append the same records at the end of the file; you should gain 10 more records each time you re-run the program. You can delete the data file to start over if you like.


Labcheckpoint.png FOR IN-LAB CREDIT: Demonstrate your output and code so far for a lab assistant before proceeding.


Random Record Access

Now let’s look at what it would take to retrieve a specific record from the file. We can already read a record, assuming the file’s “get” position is correct, so all we need to do to read from an arbitrary location is “seek” to the correct place, then hand off to the get_next_sequential() method!

The algorithm is:

clear the file stream's state flags in case of a previous error (i.e. eof, etc)
seek the "get" marker to the correct offset for the record you want to retrieve
return the result of the `get_next_sequential()` method call.

You can determine the correct offset by multiplying the “index” desired by the size of a single record:

{\textstyle P_{offset} = i \cdot S_r}

Implement the get_at_index() method now.

Add the following lines to your main program to test your work:

// Now query a few directly:
Campsite result;
result = db.get_at_index(9); // should be "cabin, riverfront"
cout << result << "\n";
result = db.get_at_index(3); // should be "RV site, covered table"
cout << result << "\n";
result = db.get_at_index(6); // should be "tent site, large, riverfront"
cout << result << "\n\n";

The write_at_index() method is symmetric to the get_at_index() method. The algorithm looks like:

clear the file stream's state flags in case of a previous error (i.e. eof, etc)
seek the "put" marker to the correct offset for the record you want to retrieve
hand off to the `write_next_sequential()` method.

Implement the write_at_index() method now.

Add the following lines to your main program to test your work:

// Write different records in two places:
db.write_at_index(8, Campsite{18, "Pavillion", true, 200.00});
db.write_at_index(2, Campsite{12, "Treehouse", false, 250.00});

// See if it worked:
result = db.get_at_index(8); // should be "Pavillion"
cout << result << "\n";
result = db.get_at_index(2); // should be "Treehouse"
cout << result << "\n";


Labcheckpoint.png FOR IN-LAB CREDIT: Demonstrate the program for the lab instructor. Then explain two or more issues you think the CampsiteDB class still has that need improvement.



Labsubmitsinglefile.png FOR IN-LAB CREDIT: Zip up these files: All files required to compile and run this project.
Name the file oop12iL.zip and upload to CSCADE.