R's read.csv() omitting rows -


in r, i'm trying read in basic csv file of 42,900 rows (confirmed unix's wc -l). relevant code is

vecs <- read.csv("feature_vectors.txt", header=false, nrows=50000) 

where nrows slight overestimate because why not. however,

>> dim(vecs) [1] 16853     5 

indicating resultant data frame has on order of 17,000 rows. memory issue? each row consists of ~30 character hash code, ~30 character string, , 3 integers, total size of file 4mb.

if it's relevant, should note lot of rows have missing fields.

thanks help!

this sort of problem easy resolve using count.fields, tells how many columns resulting data frame have if called read.csv.

(n_fields <- count.fields("feature_vectors.txt")) 

if not values of n_fields same, have problem.

if(any(diff(n_fields))) {   warning("there's problem file") } 

in case @ values of n_fields different expect: problems occur in these rows.

as justin mentioned, common problem unmatched quotes. open csv file , find out how strings quoted there. call read.csv, specifying quote argument.


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -