Last updated on October 1, 2016
It is common for an scientific program to load an ASCII format matrix file, i.e. an ASCII text file consisting of lines of float numbers separated by whitespaces. In this post, I am gonna show my code (C++ and Python) to load a matrix from an ASCII file.
C++
The following C++ function is to load a matrix from an ASCII file into a vector< vector >
object, some kind of “C++ style 2D array”.
#include <istream>
#include <string>
#include <sstream>
#include <vector>
// load matrix from an ascii text file.
void load_matrix(std::istream* is,
std::vector< std::vector<double> >* matrix,
const std::string& delim = " \t")
{
using namespace std;
string line;
string strnum;
// clear first
matrix->clear();
// parse line by line
while (getline(*is, line))
{
matrix->push_back(vector<double>());
for (string::const_iterator i = line.begin(); i != line.end(); ++ i)
{
// If i is not a delim, then append it to strnum
if (delim.find(*i) == string::npos)
{
strnum += *i;
if (i + 1 != line.end()) // If it's the last char, do not continue
continue;
}
// if strnum is still empty, it means the previous char is also a
// delim (several delims appear together). Ignore this char.
if (strnum.empty())
continue;
// If we reach here, we got a number. Convert it to double.
double number;
istringstream(strnum) >> number;
matrix->back().push_back(number);
strnum.clear();
}
}
}
// example
#include <fstream>
#include <iostream>
int main()
{
using namespace std;
// read the file
std::ifstream is("input.txt");
// load the matrix
std::vector< std::vector<double> > matrix;
load_matrix(&is, &matrix);
// print out the matrix
cout << "The matrix is:" << endl;
for (std::vector< std::vector<double> >::const_iterator it = matrix.begin(); it != matrix.end(); ++ it)
{
for (std::vector<double>::const_iterator itit = it->begin(); itit != it->end(); ++ itit)
cout << *itit << '\t';
cout << endl;
}
return 0;
}
The code is also available on GitHub Gist.
Python
The Python code loads the matrix into a numpy.matrix object.
def load_matrix_from_file(f):
"""
This function is to load an ascii format matrix (float numbers separated by
whitespace characters and newlines) into a numpy matrix object.
f is a file object or a file path.
"""
import types
import numpy
if type(f) == types.StringType:
fo = open(f, 'r')
matrix = load_matrix_from_file(fo)
fo.close()
return matrix
elif type(f) == types.FileType:
file_content = f.read().strip()
file_content = file_content.replace('\r\n', ';')
file_content = file_content.replace('\n', ';')
file_content = file_content.replace('\r', ';')
return numpy.matrix(file_content)
raise TypeError('f must be a file object or a file name.')
The code is also available on GitHub Gist.
If you want to get a nested list instead of such a numpy.matrix
object, you can use the following lines to convert the object to a nested list:
matrix = load_matrix_from_file('file_name')
nested_list = matrix.tolist()
Just curious about why you use pointers (istream *) rather than references (istream &) for the input stream in the C++ code… Is that some specific reason or it’s just personal preference? Thanks!
Just my personal preference. I think passing a variable reference that is going to be changed inside the function body is counter-intuitive. Thus, whenever I see I need to pass in a pointer, I’ll be aware that this variable is going to be changed somehow.
Yeah that makes sense. Thanks for your explanation. I also notice you put const before general references, possibly just to save the time of a copy constructor, which is consistent with your “pointer indicating changes” habit. 🙂
Hi,
I’ve tested the C++ code, and I noticed that the last number in the line is not added to the row vector. I am assuming that there is no delimiter before the EOL character. The reason for which the last number is not pushed back is that when EOL is reached the push_back instruction at line 40 is skipped. I have fixed this problem by conditioning the “continue” at line 30 in the following way:
[…]
if (delim.find(*i) == string::npos)
{
strnum += *i;
if(i+1 != line.end())
continue;
}
[…]
Correct me if I’m wrong. 🙂
You are right that the last number is not read. Probably it’s more elegant to append a delimiter to the end of line in the code before iteration rather than mess up the loop 🙂
OK, I adopted your method and updated the code, since your method can also handle empty delim string.
Great, I’m glad it helped. Thank you again for sharing the code!
Hello Hong!
I’m a beginner in C++ programming and I need to load a txt file (containing a huge matrix of floating numbers) into c++. I have copied and pasted your code in a Visual studio 2010 new project. It compiles correctly but when I run it gives me an error.fatal error LNK1561: entry point must be defined. I assume this error is because I need to set a main() function and call the function load_matrix. However, I dont know how to call it because for the function declaration you have included 3 arguments.
void load_matrix(std::istream* is,
std::vector< std::vector >* matrix,
const std::string& delim = ” t”)
I don’t understand what this arguments are used for.
In which line of your code can I write the directory where my txt file is located?
Thank you
The file path is decided by the `std::istream* is` you passed in. std::istream is a class used to handle input stream. You can check the relevant document of this class: http://www.cplusplus.com/reference/istream/istream/