/*Main Gauss codes used for the Duong and Swanson 's paper, "Empirical Evidence on The Importance of Aggregation, Asymmetry, and Jumps for Volatility Prediction, Journal of Econometrics", 2015:*/ /*---------------------------------------------------------- -------------------------------------------------------------*/ /*Description: DMTestcompare returns the DM stasistics of comparing the two models, the best one and the worst one based on the mean square errors ------- Input: 1) Data Matrix d (the vertical dimension records time series of data points; the horizontal vertical records the variblels, RV in column 3, all kind of variables, obtained from the SAS. 2) The Estimation sample size: the choice of r. r is the number of data points (estimation sample size) needed for the estimation of parameters in the model. 3) Scheme which indicates which scheme used for out-of sample prediction: Be carefull of overlapping in notations here. Scheme =1: Fixed scheme Scheme=2: Recursive scheme Scheme=3: Rolling Scheme. 4) h: forecasting horizon 5) hd: the lenghth of average for daily horizon (ex:hd=1) 6) hw: the lenghth of average for weekly horizon (ex:hw=5) 7) hm: the lenghth of average for monthly horizon (ex:hm=22) 8) flag=0 is the linear case; flag=1 is the case of the nonlinear square root; flag=2 is the case of the nonlninear log (jump+1); 9) jmodel: jmodel = 1 for just rezlized power jump jmodel =2 for downward jump realized power jmodel =3 means upward jump realized power jmodel =4 for both downward and upward jump power, low first and high second Jmodel =5 Just continuous component 1~cd~cw~cm jmodel =6 1~rvd~rvw~rvm jmodel =7 1~rvd~rvw~rvm~jd jmodel =8 1~cjd~cjw~cjm~jd~jw~jm 10) hrz: indicator for standard covariance matrix or the robust NW one., i.e. hrz = 1 for standard CV, hrz=0 for NW Output: 1) DM test stastic, 2) vector of squared errors of the model with largest mean squared error, 3) vector of squared errors of the model with smallest mean squared error, */ proc (3) = DMTestcompare(d,r,scheme,h,hd,hw,hm,flag,jmodel,hrzn); local errma,meanerr,pmin,pmax,r2_outsam,statp; {errma,meanerr,pmin,pmax,r2_outsam}=arrayoutsam(d,r,scheme,h,hd,hw,hm,flag,jmodel); statp=DMTest(errma[2:rows(errma),pmax],errma[2:rows(errma),pmin],hrzn); retp (statp,errma[2:rows(errma),pmax],errma[2:rows(errma),pmin]); endp; /*Overview: Proc DMTestMa returns the Diebold Mariano (1996) test statistics for a matrix of errorrs (multiple of vectors of squared errors ------- Input: errma: The matrix of squared errors corresponding to selected models (ex.: corresponding to all the power i) ------ Output: Diebold Mariano (1996) test statistic (matrix). ---------------*/ proc(1)= DMTestMa(errma,hrzn); local i,statma,j,vec1,vec2,dim,max ; dim=cols(errma); i=0; statma=eye(dim); do while i .le dim-1;i=i+1; j=0; do while j .le dim-1;j=j+1; vec1=errma[.,i]; vec2=errma[.,j]; statma[i,j]=DMTest(vec1,vec2,hrzn); endo; endo; retp (statma); endp; /*Overview: Proc DMTest returns the Diebold Mariano (1996) test statistics for 2 vectors of squared errors (2 models) ------- Input: 1)vecLssErs1: Vector of squared errors from model 1 2) vecLssErs2: Vector of squared errors from model 2 3)hrz: indicator for standard covariance matrix or the robust NW one., i.e. hrz = 1 for standard CV, hrz=0 for NW Output: the Diebold Mariano (1996) test statistic */ proc(1) = DMTest(vecLssErs1,vecLssErs2,hrzn); local DMstat,dbar,stDevD,valD,Pbig; if hrzn == 1; valD = vecLssErs1 - vecLssErs2; Pbig = rows(valD); dbar = meanc(valD); stDevD = stdc(valD); DMstat = (sqrt(Pbig) * dbar) / stDevD; else; valD = vecLssErs1 - vecLssErs2; Pbig = rows(valD); dbar = meanc(valD); stDevD = sqrt(NwyWst(valD)); DMstat = (sqrt(Pbig) * dbar) / stDevD; endif; retp(DMstat); endp; /*Overview: This procedure calculates Newey West Cov matrix ----- Input: vecD in vector form ---- Output: The NW covariance matrix ----- */ proc (1) = NwyWst(vecD); local vecZrMn,varJbar,incI,varShat,vecZrMn1,vecZrMn2; vecZrMn = vecD - meanc(vecD); varJbar = int(rows(vecZrMn)^(1/6)); varShat = 0; for incI(1,varJbar - 1,1); vecZrMn1 = trimr(vecZrMn,incI,0); vecZrMn2 = trimr(vecZrMn,0,incI); varShat = varShat + (((varJbar - incI)/varJbar) * 2 * (vecZrMn1'vecZrMn2)/(rows(vecZrMn1))); endfor; varShat = varShat + (vecZrMn'vecZrMn)/(rows(vecZrMn)); retp(varShat); endp; /* Overview: The Proc arrayoutsam returns the out of sample regression errors and its related statistics, and the out of sample R-squared and related coefficients. -------- Input: 1) Data Matrix d (the vertical dimension records time series of data points; the horizontal vertical records the variblels, RV in column 3, all kind of variables, obtained from the SAS. 2) The Estimation sample size: the choice of r. r is the number of data points (estimation sample size) needed for the estimation of parameters in the model. 3) Scheme which indicates which scheme used for out-of sample prediction: Be carefull of overlapping in notations here. Scheme =1: Fixed scheme Scheme=2: Recursive scheme Scheme=3: Rolling Scheme. 4) h: forecasting horizon 5) hd: the lenghth of average for daily horizon (ex:hd=1) 6) hw: the lenghth of average for weekly horizon (ex:hw=5) 7) hm: the lenghth of average for monthly horizon (ex:hm=22) 8) flag=0 is the linear case; flag=1 is the case of the nonlinear square root; flag=2 is the case of the nonlninear log (jump+1); 9) jmodel: jmodel = 1 for just rezlized power jump jmodel =2 for downward jump realized power jmodel =3 means upward jump realized power jmodel =4 for both downward and upward jump power, low first and high second Jmodel =5 Just continuous component 1~cd~cw~cm jmodel =6 1~rvd~rvw~rvm jmodel =7 1~rvd~rvw~rvm~jd jmodel =8 1~cjd~cjw~cjm~jd~jw~jm ------------------ Output: 1) errma: squared error matrix 2) meanvec: mean squared error 3) errmin: Minimum squared error 4) errmax: Maximum squared error 5) r2_outsam: out of sample r-squared and aligned regression coefficients (with corresponding power i) --------------------*/ proc(5)=arrayoutsam(d,r,scheme,h,hd,hw,hm,flag,jmodel); local rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh,l,expl,i,vec,errma,dd,errvec,meanvec,errmin, r2,r2a,yp,yp_fcast,r2_outsam,errmax,bp; {rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh,l} = arrayhar(d,h,hd,hw,hm,flag); errma={}; r2_outsam={}; i=0; do while i .le l-1; i=i+1; if jmodel == 1; /*Note that we have a lot of models, in jmodel 1 we have l small models and we look for the optimal on in this group, rsquare*/ expl=arraytomat(pj[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 2; expl=arraytomat(pjl[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 3; expl=arraytomat(pjh[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 4; expl=arraytomat(pjlh[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 5; expl=arraytomat(c[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 6; expl=arraytomat(rv[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 7; expl=arraytomat(rvj[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); elseif jmodel == 8; expl=arraytomat(cj[i,.,.]); dd=rvf~expl; {errvec,yp,yp_fcast}=outsampred(dd,r,scheme); {r2,r2a}=rsquare(yp,ones(rows(yp_fcast),1)~yp_fcast); bp=yp/(ones(rows(yp_fcast),1)~yp_fcast); else; print "wrong choice"; endif; r2_outsam=r2_outsam|(i~r2a~bp[2]); errma=errma~(i|errvec); endo; meanvec=meanc(errma[2:rows(errma),.]); meanvec=seqa(1,1,rows(meanvec))~meanvec; errmin=minindc(meanvec[.,2]); errmax=maxindc(meanvec[.,2]); meanvec=sortc(meanvec,2); r2_outsam=sortc(r2_outsam,2); retp (errma,meanvec,errmin,errmax,r2_outsam); endp; /*Overview: This proc outsampred give the vector of square residuals in out of sample forcast (matching the predicted value and the realized value) I.E. The whole sample is splitted into the estimation sample (R) and prediction sample (P). T=R+P. There are 3 schemes used depending on the indicator: Fixed, Recursive and Rolling. The square residuals are going to be used in caculating the DM test or White reality check. ------------- Input: 1) The matrix of dependent variable and explanatory variable. I put the first column as the dependent variable and columns from 2 to the last column as explanatory variables. Vector 1 is also included in here; 2) The Estimation sample size: the choice of r. r is the number of data points (estimation sample size) needed for the estimation of parameters in the model. 3) Scheme which indicates which scheme used for out-of sample prediction: Be carefull of overlapping in notations here. Scheme =1: Fixed scheme Scheme=2: Recursive scheme Scheme=3: Rolling Scheme. /*--------------------------------------------*/ /*Output: 1) Vector of P residual square of the difference between the realized data points and predicted values (size P) 2) yp - the realiezed values 3) yp_fcast - the forecast values /*--------------------------------------------*/ proc (3)= outsampred(d,r,scheme); local T,yp,xp,errvec,i,yr,xr,beta,yp_fcast,err,yp_f; /*scheme=1; d=rndn(100,8); r=20;*/ i=0; T=rows(d); yp=d[r+1:rows(d),1]; xp=d[r+1:rows(d),2:cols(d)]; errvec ={}; yp_fcast={}; do while i .le T-r-1; if scheme == 1; yr=d[1:r,1]; xr=d[1:r,2:cols(d)]; elseif scheme == 2; yr=d[1:r+i,1]; xr=d[1:r+i,2:cols(d)]; elseif scheme ==3; yr=d[i+1:r+i,1]; xr=d[i+1:r+i,2:cols(d)]; else; print "wrong choice"; endif; beta=yr/xr; err=d[r+i+1,1]-d[r+i+1,2:cols(d)]*beta; errvec=errvec|err; yp_f=d[r+i+1,2:cols(d)]*beta; yp_fcast=yp_fcast|yp_f; i=i+1; endo; retp (errvec.*errvec,yp,yp_fcast); endp; /*Overview: The arrayreg proc returns the R-square, coefficent estimates, and the optimal power with it's largest R-square ------------ Note: order in this matrix,i.e 1,4,7... for jumps. 2,5,8,.. for down jump and 3,6,9,... for up jump As for each varible corresponding to the discritized value of power p (p=2.5,2.6,....7), we have many matrices and therefore need to store them in the array This depends on the way the data exported from SAS is being organized ... , there are 201 columns corresponding to the in the file used in this codes Input: 1) Data Matrix d (the vertical dimension records time series of data points; the horizontal vertical records the variblels, RV in column 3, all kind of variables, obtained from the SAS. 2) h: forecasting horizon 3) hd: the lenghth of average for daily horizon (ex:hd=1) 4) hw: the lenghth of average for weekly horizon (ex:hw=5) 5) hm: the lenghth of average for monthly horizon (ex:hm=22) 6) flag=0 is the linear case; flag=1 is the case of the nonlinear square root; flag=2 is the case of the nonlninear log (jump+1); 7) jmodel: jmodel = 1 for just rezlized power jump jmodel =2 for downward jump realized power jmodel =3 means upward jump realized power jmodel =4 for both downward and upward jump power, low first and high second Jmodel =5 Just continuous component 1~cd~cw~cm jmodel =6 1~rvd~rvw~rvm jmodel =7 1~rvd~rvw~rvm~jd jmodel =8 1~cjd~cjw~cjm~jd~jw~jm Output: 1) mar2: output of the R-squares 2) mabeta: Output of coefficients estimates 3) ov: Output of the optimal R-square and it's power */ proc (3) = arrayreg(d,h,hd,hw,hm,flag,jmodel); local rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh,l,i,mar2,mabeta,vec,expl,beta,r2,r2a, max,t,or2,ob,ov; {rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh,l} = arrayhar(d,h,hd,hw,hm,flag); mar2={}; mabeta={}; i=0; do while i .le l-1; i=i+1; if jmodel == 1; /*Note that we have a lot of models, in jmodel 1 we have l small models and we look for the optimal on in this group, rsquare*/ expl=arraytomat(pj[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 2; expl=arraytomat(pjl[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 3; expl=arraytomat(pjh[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 4; expl=arraytomat(pjlh[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 5; expl=arraytomat(c[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 6; expl=arraytomat(rv[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 7; expl=arraytomat(rvj[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); elseif jmodel == 8; expl=arraytomat(cj[i,.,.]); beta=rvf/expl; {r2,r2a}=rsquare(rvf,expl); else; print "wrong choice"; endif; mar2=mar2|i~r2a; mabeta=mabeta|i~beta'; endo; max=maxc(mar2[.,2]); t=mar2[.,2] .== max; or2=selif(mar2,t); ob=selif(mabeta,t); ov=or2~ob[.,2:cols(ob)]; mar2=sortc(mar2,2); retp (mar2,mabeta,ov); endp; /*Overview: This procedure returns the R-square and adjust R-square from a regression ------- Input: d1 - data of dependent variable; d2 - data of explanatory variable Output: r2 - R-square; r2a- adjusted R-square */ proc (2) = rsquare(d1,d2); local y,x,n,b,yfit,r2,r2a,d; d=d1~d2; y=d[.,1]; x=d[.,2:cols(d)]; n=rows(d); b=y/x; yfit=x*b; r2=yfit'yfit/y'y; r2a=(yfit'yfit-n*meanc(y)^2)/(y'y-n*meanc(y)^2); /*r2acheck=r2-(1-r2)*(cols(x)-1)/(n-cols(x));*/ retp (r2,r2a); endp; /*Overview: This procedure gives all the data for the HAR-RV-C-... regression including the future RV and the past explanatory variables and the realized measures of jump power variations, taken from SAS files. Note: The approach is similar to proc prehar, this is written corresponding to the data file in SAS that has all the jump power variations. If the arrangement of the explanatory variables (jump power variations should match to the specific data one has.) /*Input: 1) Data Matrix d (the vertical dimension records time series of data points; the horizontal vertical records the variblels, RV in column 3, all kind of variables, obtained from the SAS. 2) h: forecasting horizon 3) hd: the lenghth of average for daily horizon (ex:hd=1) 4) hw: the lenghth of average for weekly horizon (ex:hw=5) 5) hm: the lenghth of average for monthly horizon (ex:hm=22) 6) flag=0 is the linear case; flag=1 is the case of the nonlinear square root; flag=2 is the case of the nonlninear log (jump+1); flag=3 is the case log (power jumps) * Output: rv: dependent variables; rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh are the past explanatory variables at the daily, weekly, and monthly horizons. l is a dimention of the array for the check */ proc(10)=arrayhar(d,h,hd,hw,hm,flag); local rvf,dd,dw,dm,rvd,rvw,rvm,cd,cw,cm,jd,jw,jm,pjd,pjw,pjm,cp,i,l,r,cl,clb,orders,orderc, ordersb,rv,rvj,c,cj,pj,pjh,pjl,pjlh; {rvf,dd,dw,dm}=prehar(d,h,hd,hw,hm,flag); rvd=dd[.,3]; rvw=dw[.,3];rvm=dm[.,3]; /*This is index of rv yesterday, last week and last month*/ cd=dd[.,5];cw=dw[.,5];cm=dm[.,5]; /*The variaon of continuous part yesterday, last week, last month*/ jd=dd[.,4];jw=dw[.,4];jm=dm[.,4]; /*The variaon of jump part yesterday, last week, last month*/ pjd=dd[.,ll:cols(dd)]; pjw=dw[.,ll:cols(dw)]; pjm=dm[.,ll:cols(dm)]; /*new measure of jumps, power jupms, low, and up*/ cp=cd~cw~cm; /*Continous part variation of dayly, weekly and monthly*/ l=cols(pjd)/3; r=rows(pjd); cl=7; clb=10; orderc=l|r|4; orders = l|r|cl; ordersb=l|r|clb; rv=arrayinit(l|r|4,0); rvj=arrayinit(l|r|5,0); pj=arrayinit(orders,0); pjh=arrayinit(orders,0); pjl=arrayinit(orders,0); pjlh=arrayinit(ordersb,0); c=arrayinit(orderc,0); cj=arrayinit(orders,0); i=0; do while i .le l-1; rv[i+1,.,.]=ones(rows(rvd),1)~rvd~rvw~rvm; rvj[i+1,.,.]=ones(rows(rvd),1)~rvd~rvw~rvm~jd; c[i+1,.,.]=ones(rows(rvd),1)~cp; cj[i+1,.,.]=ones(rows(rvd),1)~cp~jd~jw~jm; pj[i+1,.,.]=ones(rows(rvd),1)~cp~pjd[.,1+3*i]~pjw[.,1+3*i]~pjm[.,1+3*i]; pjl[i+1,.,.]=ones(rows(rvd),1)~cp~pjd[.,2+3*i]~pjw[.,2+3*i]~pjm[.,2+3*i]; pjh[i+1,.,.]=ones(rows(rvd),1)~cp~pjd[.,3+3*i]~pjw[.,3+3*i]~pjm[.,3+3*i]; pjlh[i+1,.,.]=ones(rows(rvd),1)~cp~pjd[.,2+3*i]~pjw[.,2+3*i]~pjm[.,2+3*i]~pjd[.,3+3*i]~pjw[.,3+3*i]~pjm[.,3+3*i]; i=i+1; endo; retp(rvf,rv,rvj,c,cj,pj,pjl,pjh,pjlh,l); endp; /*Overview: This procedure gives all the data for the HAR-RV-CJ regression including the future RV and the past explanatory variables, taken from SAS files /*Input: 1) Data Matrix d (the vertical dimension records time series of data points; the horizontal vertical records the variblels, RV in column 3, all kind of variables, obtained from the SAS. 2) h: forecasting horizon 3) hd: the lenghth of average for daily horizon (ex:hd=1) 4) hw: the lenghth of average for weekly horizon (ex:hw=5) 5) hm: the lenghth of average for monthly horizon (ex:hm=22) 6) flag=0 is the linear case; flag=1 is the case of the nonlinear square root; flag=2 is the case of the nonlninear log (jump+1); flag=3 is the case log (power jumps) * Output: rv: dependent variables; zd,zw,zm are the past explanatory variables at the daily, weekly, and monthly horizons */ proc(4) = prehar(d,h,hd,hw,hm,flag); local d1,zd,ad,zw,aw,zm,am,rv; rv= d[1:rows(d),3]; d1=abs(d[h+1:rows(d),.]);/* in the application, we set h=1,5,22*/ {zd,ad}=arv(d1,hd);/*this will create array for the aim to make average over hd*/ {zw,aw}=arv(d1,hw); {zm,am}=arv(d1,hm); zd=arraytomat(zd); zw=arraytomat(zw); zm=arraytomat(zm); zd=zd[1:rows(zm),.]; zw=zw[1:rows(zm),.]; rv=rv[1:rows(zm),.];/*This makes rv have the same rows in regeression*/ if flag == 1; zd=zd^(.5); zw=zw^.5; zm=zm^.5; rv=rv^.5; elseif flag == 2; zd=ln(zd[.,1:3])~ln(zd[.,4]+1)~ln(zd[.,5])~ln(zd[.,6:cols(zd)]+1); zw=ln(zw[.,1:3])~ln(zw[.,4]+1)~ln(zw[.,5])~ln(zw[.,6:cols(zw)]+1); zm=ln(zm[.,1:3])~ln(zm[.,4]+1)~ln(zm[.,5])~ln(zm[.,6:cols(zm)]+1); rv=ln(rv); else; zd=zd; zw=zw; zm=zm; rv=rv; endif; retp (rv,zd,zw,zm); endp; /* Overview: This procedure cacluates the average of the h consecutive values in the matrix. To do this I create many blocks, block 1 to bock h. Block 1 is original matrix, block 2 has the empty first value and the second value starts from the first value of block 1. (get rid of the final value). The last block has the h-1 first empty value. The hthe value is the 1st value in the original matrix. Notice that I just match the possition differently to caculate the average of the h consecutive values" *Note that when you use this for predition tomorow variable, you just take the matrix and take d from the second row to the last rows and apply it to avv Input: d: the data obtained from SAS h: the length of the consecutive values in the matrix over which the averages are taken. Output: zz: The array of the average value (used for regression) y: All the arrays (all the block 1,2,... ) */ proc(2) = arv(d,h); local i,orders,y,z,zz,r,cl; i=0; r=rows(d); cl=cols(d); orders = h|r|cl; y=arrayinit(orders,0);/*creating an array with 3 dimention*/ y[1,.,.]=d; do while i .le h-1; i=i+1; /*This loop will creat h matrix, putting in 1 array*/ y[i,.,.]=lagn(d,i-1); /*Though the dimention of each matrix is still the same as d as Gauss records the missing values there, in the end we will eliminate those dots*/ endo; z=amean(y,3); /*This command will calculate the mean of h matrixes in the errays*/ zz=z[1,h:r,.]; /*Only take the availabe values, not the dots., index 1 does not matter*/ retp (zz,y); endp;