r - Avoiding `foreach` by using `data.table`: error "Combining := in j with by is not yet implemented" -
i had created function couple of weeks ago using library foreach
. function finds previous month's market capitalisation companies in dataset. since dataset large, trying rewrite function using data.table
(getting rid of foreach
altogether), have been unsuccessful far.
here have: data.table object contains (among others columns) column integer specifying current month (tm), company number (permno), market capitalisation @ end of month (mktcap) , column integer previous month (pm). here summary of table year 1962:
> summary(results62) tm permno mktcap pm min. :196201 min. :10006 min. : 41 min. :196112 1st qu.:196205 1st qu.:18382 1st qu.: 11462 1st qu.:196204 median :196208 median :24328 median : 37367 median :196207 mean :196207 mean :24349 mean : 215224 mean :196201 3rd qu.:196210 3rd qu.:29866 3rd qu.: 132181 3rd qu.:196209 max. :196212 max. :86239 max. :31349066 max. :196211 na's :25
(here 196201 means 1962-jan, example)
to me started, created new object data company permno = 10006
> data1006 <- results62[permno == 10006,] > data10006 tm permno mktcap pm [1,] 196201 10006 104171.00 196112 [2,] 196202 10006 104527.75 196201 [3,] 196203 10006 97036.00 196202 [4,] 196204 10006 102565.62 196203 [5,] 196205 10006 85263.25 196204 [6,] 196206 10006 84193.00 196205 [7,] 196207 10006 98077.50 196206 [8,] 196208 10006 97532.62 196207 [9,] 196209 10006 92265.50 196208 [10,] 196210 10006 98804.00 196209 [11,] 196211 10006 105887.38 196210 [12,] 196212 10006 112062.62 196211
then created column called lagmktcap nas placeholders
> data1006[,lagmktcap := na_real_]
to include previous month market capitalisation each observation use
> data1006[,lagmktcap := data1006$mktcap[match(data1006$pm,data1006$tm)]] tm permno mktcap pm lagmktcap [1,] 196201 10006 104171.00 196112 na [2,] 196202 10006 104527.75 196201 104171.00 [3,] 196203 10006 97036.00 196202 104527.75 [4,] 196204 10006 102565.62 196203 97036.00 [5,] 196205 10006 85263.25 196204 102565.62 [6,] 196206 10006 84193.00 196205 85263.25 [7,] 196207 10006 98077.50 196206 84193.00 [8,] 196208 10006 97532.62 196207 98077.50 [9,] 196209 10006 92265.50 196208 97532.62 [10,] 196210 10006 98804.00 196209 92265.50 [11,] 196211 10006 105887.38 196210 98804.00 [12,] 196212 10006 112062.62 196211 105887.38
which perfect. need each of companies, using whole dataset contains thousands of companies. best attempt was
> results62[,lagmktcap := results62$mktcap[match(results62$pm,results62$tm)],by=permno]
but error
error in
[.data.table
(results62, ,:=
(lagmktcap, results62$mktcap[match(results62$pm, : combining := in j not yet implemented. please let maintainer('data.table') know if interested in this.
i not sure how except using foreach
: create vector unique numbers of companies , iterate follows:
conumb <- unique(results62$permno) lag.mkt.cap <- function(results62){ results62$mktcap[match(results62$pm,results62$tm)] } lagmktcap <- foreach(i=1:length(conumb),.combine=c) %do% lag.mkt.cap(results62[permno == conumb[i],])
this big improvement on previous function (it takes 1/6 of time), avoid using foreach
, make of data.table
. ideas?
ps: might helpful use example dataset contains data 3 companies spanning 4 months:
dataexample <- data.table(tm = c(196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l), permno = c(10006l, 10006l, 10006l, 10006l, 10014l, 10014l, 10014l, 10014l, 10030l, 10030l, 10030l, 10030l), mktcap = c(104171, 104527.75, 97036, 102565.625, 13290.75, 14499, 13693.5, 12485.25, 81600, 83232, 81600, 82416), pm = c(196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l))
Comments
Post a Comment