r - Avoiding `foreach` by using `data.table`: error "Combining := in j with by is not yet implemented" -


i had created function couple of weeks ago using library foreach. function finds previous month's market capitalisation companies in dataset. since dataset large, trying rewrite function using data.table (getting rid of foreach altogether), have been unsuccessful far.

here have: data.table object contains (among others columns) column integer specifying current month (tm), company number (permno), market capitalisation @ end of month (mktcap) , column integer previous month (pm). here summary of table year 1962:

> summary(results62)        tm             permno          mktcap                pm          min.   :196201   min.   :10006   min.   :      41   min.   :196112    1st qu.:196205   1st qu.:18382   1st qu.:   11462   1st qu.:196204    median :196208   median :24328   median :   37367   median :196207    mean   :196207   mean   :24349   mean   :  215224   mean   :196201    3rd qu.:196210   3rd qu.:29866   3rd qu.:  132181   3rd qu.:196209    max.   :196212   max.   :86239   max.   :31349066   max.   :196211                                                        na's   :25      

(here 196201 means 1962-jan, example)

to me started, created new object data company permno = 10006

> data1006 <- results62[permno == 10006,] > data10006           tm permno    mktcap     pm  [1,] 196201  10006 104171.00 196112  [2,] 196202  10006 104527.75 196201  [3,] 196203  10006  97036.00 196202  [4,] 196204  10006 102565.62 196203  [5,] 196205  10006  85263.25 196204  [6,] 196206  10006  84193.00 196205  [7,] 196207  10006  98077.50 196206  [8,] 196208  10006  97532.62 196207  [9,] 196209  10006  92265.50 196208 [10,] 196210  10006  98804.00 196209 [11,] 196211  10006 105887.38 196210 [12,] 196212  10006 112062.62 196211 

then created column called lagmktcap nas placeholders

> data1006[,lagmktcap := na_real_] 

to include previous month market capitalisation each observation use

> data1006[,lagmktcap := data1006$mktcap[match(data1006$pm,data1006$tm)]]           tm permno    mktcap      pm lagmktcap  [1,] 196201  10006 104171.00  196112        na  [2,] 196202  10006 104527.75  196201 104171.00  [3,] 196203  10006  97036.00  196202 104527.75  [4,] 196204  10006 102565.62  196203  97036.00  [5,] 196205  10006  85263.25  196204 102565.62   [6,] 196206  10006  84193.00  196205  85263.25   [7,] 196207  10006  98077.50  196206  84193.00  [8,] 196208  10006  97532.62  196207  98077.50  [9,] 196209  10006  92265.50  196208  97532.62 [10,] 196210  10006  98804.00  196209  92265.50 [11,] 196211  10006 105887.38  196210  98804.00 [12,] 196212  10006 112062.62  196211 105887.38 

which perfect. need each of companies, using whole dataset contains thousands of companies. best attempt was

> results62[,lagmktcap := results62$mktcap[match(results62$pm,results62$tm)],by=permno] 

but error

error in [.data.table(results62, , :=(lagmktcap, results62$mktcap[match(results62$pm, : combining := in j not yet implemented. please let maintainer('data.table') know if interested in this.

i not sure how except using foreach: create vector unique numbers of companies , iterate follows:

conumb <- unique(results62$permno)  lag.mkt.cap <- function(results62){ results62$mktcap[match(results62$pm,results62$tm)] }  lagmktcap <- foreach(i=1:length(conumb),.combine=c) %do% lag.mkt.cap(results62[permno == conumb[i],]) 

this big improvement on previous function (it takes 1/6 of time), avoid using foreach , make of data.table. ideas?

ps: might helpful use example dataset contains data 3 companies spanning 4 months:

dataexample <- data.table(tm = c(196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l, 196201l, 196202l, 196203l, 196204l),  permno = c(10006l, 10006l, 10006l, 10006l, 10014l, 10014l, 10014l, 10014l, 10030l, 10030l, 10030l, 10030l),  mktcap = c(104171, 104527.75, 97036, 102565.625, 13290.75, 14499, 13693.5, 12485.25, 81600, 83232, 81600, 82416),  pm = c(196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l, 196112l, 196201l, 196202l, 196203l)) 


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -