Tuesday, 28 June 2016

Problem with MATLAB textscan only reading in first cell

I was having a problem with MATLAB using the textscan() function, where it was only reading in the data for the first cell, but wasn't throwing any errors.

I was using the following command:

T = textscan(fid, format_spec, 'HeaderLines', 0);

Evaluating T seemed to show that the file was being read correctly, with the correct number of colums matching my CSV.

>> T
T =
  Columns 1 through 4
    {1x1 cell}    [11]    [0x1 double]    [0x1 double]
  Columns 5 through 7
    [0x1 double]    [0x1 double]    [0x1 double]
  Columns 8 through 10
    [0x1 double]    [0x1 double]    [0x1 double]

The problem was that I was reading a CSV file, and the default setting for textscan uses another option for the delimiter. By leaving this field unspecified, the function was not able to handle the file correctly. I changed the line to the following:

T = textscan(fid, format_spec, 'Delimiter', ',', 'HeaderLines', 0);

This now read in the entire file correctly, revealing all of the rows were read in as expected:

>> T
T =
  Columns 1 through 2
    {81231x1 cell}    [81231x1 double]
  Columns 3 through 4
    [81231x1 double]    [81231x1 double]
  Columns 5 through 6
    [81231x1 double]    [81231x1 double]
  Columns 7 through 8
    [81231x1 double]    [81231x1 double]
  Columns 9 through 10
    [81231x1 double]    [81231x1 double]

Note that the data was read into "cells", which means that it must be handled slightly differently. In this case, if I evaluate T(1), this doesn't give all of the data from column one, I just get the following:


>> T(1)
ans =
    {81231x1 cell}

Instead, I have to use the following command:

>> T{1}
ans = 
    '2016-02-05_19-09-50'    '2016-02-05_19-10-38'    '2016-02-05_19-21-43'    '2016-02-05_19-22-31'    '2016-02-05_19-23-19'    '2016-02-05_19-24-08'    '2016-02-05_19-26-11'    '2016-02-05_19-26-59'    '2016-02-05_19-27-47'    '2016-02-05_19-28-36'

Note the curly braces around the number after T. It is also possible to only return a subset of the data by specifying the indices after the curly braces, as follows:

>> T{1}(1:3)
ans = 
    '2016-02-05_19-09-50'    '2016-02-05_19-10-38'    '2016-02-05_19-21-43'

Finally, we can get the data into a format we are more used to by using the cell2mat() function.

dates=cell2mat(T{1})

This concerts the data into a matrix. In my case I have read in strings, and so the matrix is a 2d array. Now to view the first three dates, I can use the following command:

>> dates(1:3,:)
ans =
2016-02-05_19-09-502016-02-05_19-10-382016-02-05_19-21-43

And from here the data can be treated as normal.







No comments:

Post a Comment