R's read.csv() omitting rows -
in r, i'm trying read in basic csv file of 42,900 rows (confirmed unix's wc -l). relevant code is
vecs <- read.csv("feature_vectors.txt", header=false, nrows=50000)
where nrows slight overestimate because why not. however,
>> dim(vecs) [1] 16853 5
indicating resultant data frame has on order of 17,000 rows. memory issue? each row consists of ~30 character hash code, ~30 character string, , 3 integers, total size of file 4mb.
if it's relevant, should note lot of rows have missing fields.
thanks help!
this sort of problem easy resolve using count.fields
, tells how many columns resulting data frame have if called read.csv
.
(n_fields <- count.fields("feature_vectors.txt"))
if not values of n_fields same, have problem.
if(any(diff(n_fields))) { warning("there's problem file") }
in case @ values of n_fields
different expect: problems occur in these rows.
as justin mentioned, common problem unmatched quotes. open csv file , find out how strings quoted there. call read.csv
, specifying quote
argument.
Comments
Post a Comment