database 基于 SAS 中的 2 个变量合并数据集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2445953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging datasets based on 2 variables in SAS
提问by John
I'm working with different databases. All of them contain information about 1000+ companies. A company is defined by its ticker code (the short version of the name (i.e. Ford as F) usually seen on stock quotation boards).
我正在使用不同的数据库。所有这些都包含有关 1000 多家公司的信息。公司由其股票代码(股票报价板上常见的名称缩写(即福特为 F))定义。
Aside from the ticker code to merge on I also have to merge on the time. I used month as a count variable throughout my time series. The final purpose is to have a regression in the kind of
除了要合并的股票代码之外,我还必须按时合并。我在整个时间序列中都使用月份作为计数变量。最终目的是在那种情况下进行回归
Y(jt) = c + X(jt) +X1(jt)
etc with j = company
(ticker) and t = time
(month).
Y(jt) = c + X(jt) +X1(jt)
等与j = company
(股票代码)和t = time
(月)。
So imagine I have 2 databases, one of which is the base database with variables such as Tickers, months, betas of a company (risk measure) etc. and a second database which has an extra variable (let's say market capitalisation).
所以想象一下我有 2 个数据库,其中一个是基本数据库,其中包含变量,例如股票代码、月份、公司的 beta(风险度量)等,另一个数据库具有额外的变量(假设市值)。
What I want to do then is to merge these 2 databases based on the ticker and the month.
然后我想做的是根据股票行情和月份合并这两个数据库。
Example: Base database:
示例:基础数据库:
Ticker ____ Month ____ Betas AA ____ 4 ____ 1.2 BB ____ 8 ____ 1.18
Second database:
第二个数据库:
Ticker ____ Month ____ MCAP AA ____ 4 ____ 8542 BB ____ 6 ____ 1245
Then after merge I would like to have something like this:
然后合并后我想要这样的东西:
Ticker ____ Month ____ Betas ____ MCAP AA ____ 4 ____ 1.2 ____ 8542
So all observations that do not match BOTH the date and ticker have to be dropped. I'm sure this is possible, just can't find the right type of code.
因此,必须删除所有与日期和代码都不匹配的观察结果。我确定这是可能的,只是找不到正确类型的代码。
PS: I'm guessing the underscores have something to do with font layout but both the bold as italic is supposed to be normal :)
PS:我猜下划线与字体布局有关,但粗体和斜体都应该是正常的:)
回答by lkbiostat
Agree with Jonathan... after sorting both datasets independently by ticker and time, the data step of merging is what I would use..... little modification
同意乔纳森...在按股票行情和时间独立排序两个数据集后,合并的数据步骤是我将使用的.....小修改
data want;
merge base(in = b) mcap(in = m);
by ticker time;
if m & b;
run;
Records that don't have common ticker and time in both datasets would be dropped automatically..
在两个数据集中没有通用代码和时间的记录将被自动删除。
回答by Jonathan Goldberg
Calling the two datasets base and mcap, and assuming that they have both been sorted by ticker and month, you can do it this way:
调用两个数据集 base 和 mcap,并假设它们都已按股票行情和月份排序,您可以这样做:
data want;
merge base(in = b)
mcap(in = m);
if m & b;
run;
The subsetting if will not accept any row that does not match in bath datasets.
子集 if 将不接受任何在浴数据集中不匹配的行。
回答by sunil
proc sort data=database1;
by ticker month;
run;
proc sort data=database2;
by ticker month;
run;
data gh;
merge database1(in=a) database2(in=b);
by ticker month;
if a and b;
run;
回答by John
Ok so it appears you can just do it very easily by:
好的,看来您可以通过以下方式轻松完成:
proc sort data=work;
by ticker month;
run;
proc sort data=wsize;
by ticker month;
run;
data test;
merge work(in=a) wsize(in=b);
by ticker month;
frommerg=a;
fromwtvol=b;
run;
data test;
set test;
if frommerg=0 then delete;
run;
data test;
set test;
if fromwtvol = 0 then delete;
run;
data test;
set test;
drop frommerg fromwtvol;
run;
That's the code I used, I tried this before posting because I didn't want to look like a leecher but it so happens that the 2 databases i tried had nothing in common (what are the odds with 70.000 observations :D), I retried it and it works (for now!)
这就是我使用的代码,我在发布之前尝试过这个,因为我不想看起来像个水蛭,但碰巧我尝试的 2 个数据库没有任何共同点(70.000 观察的几率是多少:D),我重试了它和它的工作原理(现在!)
Thanks anyway!
不管怎么说,还是要谢谢你!