TIP/Trick: How to count words in a text file using C++

Sometimes we may find ourselves handling data files that contain information whose data integrity needs to be checked or compared against rules related to byte size, amount of lines or amount of WORDS. Hence the need of having a function to count the amount of words in a string in our string extensions library.

Working with files in middle level languages like C & C++ can be an obscure task for people who are getting started with programming. High level languages like C#, Java or Ruby do a pretty good job creating an intuitive abstraction layer to avoid overwhelming the programmer with the raw compelling nuances when handling files.

Loading the file

I’ve implemented the countWords function for both, null terminated char sequences and strings, as well as the read file function for each case. Heres the code:

int readFileToBuffer( char * filePath, char *& buffer )
{
    /*
     * VC++ will give you a warning message when using fopen.
     * The function has been flagged as unsafe and you have two choices.
     * First: use fopen_s http://msdn.microsoft.com/en-us/library/z5hh6ee9(v=vs.80).aspx
     * This will limit your implementation to work only in Windows based machines.
     * Second: add the _CRT_SECURE_NO_WARNINGS to the compiler preprocessor definitions
     * This will get rid of the warning message.
     * Microsoft arguments to flag this function are strong and appropriate, nevertheless
     * I know what I'm doing and that's why I chose to stick to the fopen function.
     * If you want to know more about this google is your friend ;-)
     */
    FILE * file = fopen( filePath, "r" );

    if( file == NULL ) return 0;

    fseek( file, 0, SEEK_END );
    long fsize = ftell( file ); // http://www.cplusplus.com/reference/cstdio/ftell/
    fseek( file, 0, SEEK_SET );

    buffer = ( char * ) malloc( fsize + 1 );
    fread( buffer, fsize, 1, file );
    fclose( file );

    buffer[fsize] = 0;

    return fsize;
}

int readFileToBuffer( char * file, std::string& buffer )
{
    std::ifstream t( file );
    t.seekg(0, std::ios::end); 
        // http://www.cplusplus.com/reference/istream/istream/tellg/
    int fsize = static_cast< int >( t.tellg() );
    buffer.reserve( fsize );
    t.seekg( 0, std::ios::beg );

    buffer.assign( ( std::istreambuf_iterator< char >( t )),
        std::istreambuf_iterator< char >() );

    return fsize;
}

Counting the Words

Disclaimer: This is a very simple implementation and some edge cases have been obviated. This is just an attempt of pointing in the right direction people that struggle when trying to approach this task.

int countWords( const char* str )
{
        if ( str == NULL ) return 0;
    int numWords = 1;
    while ( *str++ != NULL )  if( *str == 32 ) numWords++;
    return numWords;
}
int countWords( const std::string& str )
{
    if ( str.empty() ) return 0;
    int numWords = 1;
    for( unsigned int i = 1; i < str.length(); ++i ) if( str[i] == 32 ) numWords++;  
    return numWords;
}

The client code for this implementation is very simple:

readFileToBuffer( "loremipsum.txt", string );

int x = countWords( string );

The full implementation of this source code can be found here.

Enjoy!