Thursday, 30 July 2015

Reading a CSV file in Matlab with header rows and mixed text, timestamps, integers and doubles

I wanted to read in a CSV file in Matlab, but was having problems using csvread(), which only supports files with only numeric values. It seems as though the easiest way to read in a mixed CSV file, is to read in the header first, and then read in the data using textscan.

I based my code on an answer from StackExchange. For reading the header, the following code worked well for me:
fid = fopen(file, 'r');
Header = fgetl(fid);
fclose(fid);

Header = regexp(Header, '([^,]*)', 'tokens');
Header = cat(2, Header{:});
The values are then in cells, and we need to use the function cell2mat() to convert the data in to something we can work with. This works for both strings and numerical values.

Then to read the main body of the text file, I used textscan(). This requires a string defining the format. As my format string is quite long, due to having a lot of columns, I've defined the format string separately. I have used the following code to define the format:
format_a='%u%u%s%u%u%f%f%u%u';
format_b=repmat('%f',1,109); %repeat the string '%f' 109 times
format_c='%u%u%u';

format=strcat(format_a, format_b, format_c);
This generates a long string specifying the format of the file that will be read in. The input string '%u' is an unsigned integer, and the input string is for '%s' strings. I also has a time and date column in the CSV, which I read with the datenum() function later, so for now I have treated it as a string.

The reading in of the file is done with the textscan() function in the following few lines:
fid = fopen(file, 'r');D = textscan(fid, format, 'Delimiter', ',', 'HeaderLines', 1);fclose(fid);

By setting the Delimiter, this allows other options than only the comma. Also the number of header columns can be set by the number following 'HeaderLines'.

This reads all of the data in to the variable which is saved in cells. So to read in the numbers from the cells, I have used the function cell2mat(). Here is an example of how I read in one column:
index=cell2mat(D(1));
Similarly, to read in the data from multiple columns, I've used the following code:
 data=cell2mat(D(10:118));
 This works for integers and doubles. To handle the date, I have used the following line:
timestamps=datenum(D{3}, 'dd/mm/yyyy HH:MM');
This interprets the information in the strings, and creates a "datetime" object, which can be used much more easily when plotting data.

No comments:

Post a Comment