r - Avoiding `foreach` by using `data.table`: error "Combining := in j with by is not yet implemented" -
i had created function couple of weeks ago using library foreach. function finds previous month's market capitalisation companies in dataset. since dataset large, trying rewrite function using data.table (getting rid of foreach altogether), have been unsuccessful far.
here have: data.table object contains (among others columns) column integer specifying current month (tm), company number (permno), market capitalisation @ end of month (mktcap) , column integer previous month (pm). here summary of table year 1962:
> summary(results62) tm permno mktcap pm min. :196201 min. :10006 min. : 41 min. :196112 1st qu.:196205 1st qu.:18382 1st qu.: 11462 1st qu.:196204 median :196208 median :24328 median : 37367 median :196207 mean :196207 mean :24349 mean : 215224 mean :196201 3rd qu.:196210 3rd qu.:29866 3rd qu.: 132181 3rd qu.:196209 max. :196212 max. :86239 max. :31349066 max. :196211 na's :25 (here 196201 means 1962-jan, example)
to me started, created new object data company permno = 10006
> data1006 <- results62[permno == 10006,] > data10006 tm permno mktcap pm [1,] 196201 10006 104171.00 196112 [2,] 196202 10006 104527.75 196201 [3,] 196203 10006 97036.00 196202 [4,] 196204 10006 102565.62 196203 [5,] 196205 10006 85263.25 196204 [6,] 196206 10006 84193.00 196205 [7,] 196207 10006 98077.50 196206 [8,] 196208 10006 97532.62 196207 [9,] 196209 10006 92265.50 196208 [10,] 196210 10006 98804.00 196209 [11,] 196211 10006 105887.38 196210 [12,] 196212 10006 112062.62 196211 then created column called lagmktcap nas placeholders
> data1006[,lagmktcap := na_real_] to include previous month market capitalisation each observation use
> data1006[,lagmktcap := data1006$mktcap[match(data1006$pm,data1006$tm)]] tm permno mktcap pm lagmktcap [1,] 196201 10006 104171.00 196112 na [2,] 196202 10006 104527.75 196201 104171.00 [3,] 196203 10006 97036.00 196202 104527.75 [4,] 196204 10006 102565.62 196203 97036.00 [5,] 196205 10006 85263.25 196204 102565.62 [6,] 196206 10006 84193.00 196205 85263.25 [7,] 196207 10006 98077.50 196206 84193.00 [8,] 196208 10006 97532.62 196207 98077.50 [9,] 196209 10006 92265.50 196208 97532.62 [10,] 196210 10006 98804.00 196209 92265.50 [11,] 196211 10006 105887.38 196210 98804.00 [12,] 196212 10006 112062.62 196211 105887.38 which perfect. need each of companies, using whole dataset contains thousands of companies. best attempt was
> results62[,lagmktcap := results62$mktcap[match(results62$pm,results62$tm)],by=permno] but error
error in
[.data.table(results62, ,:=(lagmktcap, results62$mktcap[match(results62$pm, : combining := in j not yet implemented. please let maintainer('data.table') know if interested in this.
i not sure how except using foreach: create vector unique numbers of companies , iterate follows:
conumb <- unique(results62$permno) lag.mkt.cap <- function(results62){ results62$mktcap[match(results62$pm,results62$tm)] } lagmktcap <- foreach(i=1:length(conumb),.combine=c) %do% lag.mkt.cap(results62[permno == conumb[i],]) this big improvement on previous function (it takes 1/6 of time), avoid using foreach , make of data.table. ideas?
ps: might helpful use example dataset contains data 3 companies spanning 4 months:
dataexample <- data.table(tm = c(196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l), permno = c(10006l, 10006l, 10006l, 10006l, 10014l, 10014l, 10014l, 10014l, 10030l, 10030l, 10030l, 10030l), mktcap = c(104171, 104527.75, 97036, 102565.625, 13290.75, 14499, 13693.5, 12485.25, 81600, 83232, 81600, 82416), pm = c(196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l))
Comments
Post a Comment