注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

零售创新,创新那些事儿,SPSS,VBA

零售创新

 
 
 

日志

 
 
关于我

新浪微博,零售创新 研究经理,数据分析师 希望和市场研究和零售业的同事共同进步! 本博客发表的都是免费或试用的资料,如果有版权问题请发邮件wangli12a@163.com联系删除。 spss excel vba blog

网易考拉推荐

SPSS Syntax学习转载  

2008-09-02 16:01:02|  分类: spss学习 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
SPSS Syntax学习(一)
* ============================================================.
* File:       擂题二.
* Purpose:    SPSS中文论坛擂题二.
* Author:     GTds.
* Written:    2005.05.03.
* Notes:      .
* ============================================================.
*=============================================================.
* 第一级(*,20分):数据文件中存在的空白记录问题,指的是整行都缺失数据的情况。用SPSS Syntax快速删除空白记录,并保存该文件ex1.sav.
*=======================================================.
* 解析:删除记录的另一种思考角度就是保留剩余的其他记录。整行缺失换言之即是该行任何一个变量都是缺失,相斥的情况就是至少有一个变量具有有效值.
*----------------------------------------------------------------------------------------------------------------.
*方法一:缺失值少于变量个数.
get file 'F:\SPSS擂题\theme park.sav'.

sel if nmiss(custnum to firstvis)<15.
exe.

save outfile "F:\SPSS擂题\ex1.sav".
*----------------------------------------------------------------------------------------------------------------.
*方法二:至少有一个有效值.
get file 'F:\SPSS擂题\theme park.sav'.

sel if nvalid(custnum to firstvis) >0.
exe.

save outfile "F:\SPSS擂题\ex1.sav".
*----------------------------------------------------------------------------------------------------------------.
*方法三:至少有一个不是缺失值.
get file 'F:\SPSS擂题\theme park.sav'.

sel if ~ miss(max(custnum to firstvis)).
exe.

save outfile "F:\SPSS擂题\ex1.sav".

*=============================================================.
* 第二级(**,30分):该数据文件包含一些重复记录,请采用有效方法彻底清除这些重复记录,保留单一记录,然后保存为文件ex2.sav。
  请问有多少重复记录?请用SPSS Syntax 保存所有操作过程。.
*=======================================================.
*方法一:检查一组(或所有)变量的重复.

get file 'F:\SPSS擂题\ex1.sav'.

sort case by all.
match files
/file *
/by all
/first=Primary.
val lab primary 0 "重复记录" 1"有效记录".
freq Primary./*根据频数表,有15条重复记录需要删除.

sel if primary=1.
exe.

save outfile "F:\SPSS擂题\ex2.sav"/drop primary .

*----------------------------------------------------------------------------------------------------------------.
*方法二:检查一个关键字段的重复.
*在本例中,custnum是个关键变量(每条记录都有唯一的custnum),这种方法计算结果与方法一相同.
*这种方法比方法一更容易理解,且使用方便。当然,对于更复杂的数据结构则需要判断是否适用。.

get file 'F:\SPSS擂题\ex1.sav'.

sort case by all.
compute Primary=lag(custnum)~=custnum |miss(lag(custnum)).
val lab primary 0 "重复记录" 1"有效记录".
freq Primary./*根据频数表,有15条重复记录需要删除.

sel if primary=1.
exe.

save outfile "F:\SPSS擂题\ex2.sav"/drop primary .

*=============================================================.
* 第三级(****,40分):将文件ex2.sav中的记录,请用SPSS Syntax 将Ticketno≤475685041的样本中随机抽取50个记录,
  在Ticketno>475685041的样本中又随机抽取50个记录,合成一个新的数据集ex3.sav。能否实现任意指定Ticketno的抽取范围,
  然后可以实现随机抽样?例如在21778332≤Ticketno≤220015616中随机抽取50个记录。.
*=======================================================.
*方法一: 分情况下的一步完成,不需要再合并数据库.
*本例中,要求分两种情况各抽取50,然后合并为一.因此首先计算出各种情况的Cases数,然后直接使用sample命令即可.
*但是需要依据频数表结果来手动指定From后面的数值,难以重复实现任意指定范围的随机抽样,适合一次性操作.

get file 'F:\SPSS擂题\ex2.sav'.

compute dummy=custnum>475685041.
val lab dummy 0"custnum≤475685041" 1"custnum>475685041".
freq dummy ./*根据频数表知道两种情况分别有422,700个记录.
del var dummy.

do if custnum>475685041.
   sample 50 from 422.
else.
   sample 50 from 700.
end if.
exe.

save outfile "F:\SPSS擂题\ex3.sav" .

*----------------------------------------------------------------------------------------------------------------.
*方法二:用宏实现任意条件下的随机抽样.
*根据SPSS的算法手册,sample命令的计算等同于下述过程.

define sampling (expr=!ENCLOSE("(",")")
                /size=!TOKENS(1))
sel if !expr.
compute idno=$casenum .
sort case by idno(d).

do if $casenum = 1.
  compute #s1=!size.
  compute #s2=idno.
end if.
do if #s2 > 0.
  compute dummy  = uniform(1)* #s2 < #s1.
  compute #s1 = #s1 - dummy.
  compute #s2 = #s2 - 1.
else.
  compute dummy = 0.
end if.
sel if dummy=1.
exe.
!enddefine.

*-----------------------------------------.
*比如:在21778332≤Ticketno≤220015616中随机抽取50个记录.
get file 'F:\SPSS擂题\ex2.sav'.
sampling expr=(range(custnum,21778332,220015616)) size=50.


*本题:将Ticketno≤475685041的样本中随机抽取50个记录,在Ticketno>475685041的样本中又随机抽取50个记录,合成一个新的数据集ex3.sav.
get file 'F:\SPSS擂题\ex2.sav'.
sampling expr=(custnum<=475685041) size=50.
save outfile "F:\SPSS擂题\ex3.sav" .

get file 'F:\SPSS擂题\ex2.sav'.
sampling expr=(custnum>475685041) size=50.

Add files
/file "F:\SPSS擂题\ex3.sav"
/file *.
exe.
save outfile "F:\SPSS擂题\ex3.sav" .

 

 

 


Missing Value Functions
NMISS(variable[,...]) Numeric. Returns a count of the arguments that have missing values. This function requires one or more arguments, which should be variable names in the working data file.

MISSING(variable) Logical. Returns 1 or true if variable has a missing value. The argument should be a variable name in the working data file.

SYSMIS(numvar) Logical. Returns 1 or true if the value of numvar is system-missing. The argument numvar must be the name of a numeric variable in the working data file.

VALUE(variable) Numeric or string. Returns the value of variable, ignoring user-missing value definitions for variable, which must be a variable name in the working data file.

 

 LOOP—END LOOP
LOOP [varname = n TO m [BY {1**}]] [IF [(]logical expression[)]] {n }transformation commands
END LOOP [IF [(]logical expression[)]]
**Default if the subcommand is omitted.

Examples:

SET MXLOOPS = 10. /*Maximum number of loop sallowed
LOOP. /*Loop with no limit other than MXLOOPS
COMPUTEX=X+1.
END LOOP.

LOOP #I = 1 TO 5. /*Loop five times
COMPUTEX = X + 1.
END LOOP.

 

?The specification on IF is a logical expression enclosed in parentheses.

Example
LOOP.
COMPUTE X=X+1.
END LOOP IF (X EQ 5). /*Loop until X is 5

?Iterations continue until the logical expression on END LOOP is true, which for every caseis when X equals 5. Each case does not go through the same number of iterations.

?This corresponds to the programming notion of DO UNTIL. The loop is always executedat least once.

Example

LOOP IF (X LT 5). /*Loop while X is less than 5
COMPUTEX=X+1.
END LOOP.

?The IF clause is evaluated each trip through the structure, so looping stops once X equals 5.
?This corresponds to the programming notion of DO WHILE. The loop may not be exe-cuted at all.

Example

LOOP IF ( Y GT 10) ./*Loop only for cases with Y GT 10
COMPUTEX=X+1.
END LOOP IF ( X EQ 5). /*Loop until X IS 5

?The IF clause on LOOP allows transformations to be performed on a subset of cases. X isincreased by 5 only for cases with values greater than 10 for Y. X is not changed for allother cases.

 

if (miss(b)=1)  c=1.
execute.

if (f5<=12 and miss(f7)=1)  sss5=1.
execute.


USE ALL.
COMPUTE filter_$=(AGE_1 >= 21 & AGE_1 <= 25 & AGE_2 ~= 3).
FILTER BY filter_$.
list number AGE_1 AGE_2 .
EXECUTE .

 

删除spss数据中都是missing值的变量

*生成数据

set printback=listing.
data list list/v1 to v5.
begin data.
1   0   .   0   .
0   0   .   0   .
0   2   .   0   .
end data.
save outfile='c:\temp\origdata.sav'.

 

*转置并计算missing的数量

flip.
count allmiss=var001 to var003 (sysmis).  /* change var003 for number of cases */
select if allmiss=3.                      /* change      3 for number of cases */

 

*写程序文件

 

do if $casenum=1.
write outfile='c:\temp\select.sps'/"get file='c:\temp\origdata.sav'/drop=".
end if.
write outfile='c:\temp\select.sps'/" " case_lbl.
exe.

 

include file='c:\temp\select.sps'.
exe.


识别spss数据中的missing值

DATA LIST LIST /a.
BEGIN DATA
1
9
2
.
4
END DATA.
LIST.

MISSING VALUES a (9).

COMPUTE sysmis1=SYSMIS(a).
COMPUTE missing1=MISSING(a).
COMPUTE usermis1=missing1 - sysmis1.
LIST.

*Give the following output:
       A  SYSMIS1 MISSING1 USERMIS1
    1.00      .00      .00      .00
    9.00      .00     1.00     1.00
    2.00      .00      .00      .00
     .       1.00     1.00      .00
    4.00      .00      .00      .00

 

从上面的例子可以看出来MISSING包括认为规定的missing值和系统的missing值。


spss中变量赋值
compute data1=date.mdy(month,day, year).
 
compute num1=value.
 
String A(a11).
compute a='hello world'.
 


三个实用问题的解答

问:怎样用SPSS编程实现5位数的重复排列?

答:SPSS的VARSTOCASES命令的INDEX子命令可以实现任意多位数的重复排列,只有计算机的容量允许。下面的程序是对4个元素的重复排列,稍加修改可是实现对任意元素的重复排列,条件是你的计算机能够承受的起。

实现的程序:

new file.

input program.

- vector a(256).

- loop #i= 1 to 256.

-   compute a(#i)=uniform(100).

- end loop.

- end case.

- end file.

end inpute program.

exe.

varstocases

 /make c from a1 to a256

 / id= i

 /index = b1(4) b2(4) b3(4) b4(4) .

match files file=*

 /keep = b1 to b4.

add value labels b1 to b4 '1' '因素1'

                          '2' '因素2'

                          '3' '因素3'

                          '4' '因素4'.

exe.

 

 

问:在SPSS中怎样实现对带日期和时间格式的数据进行筛选?

答:SPSS直接对DATETIME格式的数据进行筛选可能有些问题,我考虑一种变通的办法,将DATETIME格式的数据转换位秒数,然后建立两个表示秒数的变量,对这两个变量进行筛选就可以了。数据见附件。

实现程序是:首先打开数据文件,然后输入下列程序。

compute s1=ctime.seconds(time_start).

compute s2=ctime.seconds(time_end).

select if (s1 >= 13390571256 & s2 <= 13391090786).

match files file=*

 /keep = time_start time_end.

exe.

 

 

 

问:有一个大型文件,有上万的纪录,但只有两个变量,一个是客户编号,另一个是客户名称。要求对每个客户建立一个文件,即对每个纪录建立一个文件,用每个纪录的客户编号做文件名,手工建立不现实,能否用程序实现?

答:这个问题可以用宏指令实现,程序如下:

data list list /id(f3) name(a10).
begin data
1 Microsoft
2 IBM
3 BenQ
4 Sumsung
5 Huawei
6 ComPQ
end data.

define !magic (num=!tokens(1))
temporary.
select if id=!num.
save outfile = !quote(!concat('d:\temp\',!num,'.sav')).
!enddefine.

write outfile = 'd:\temp\callmagic.sps'
/'!magic num ='id'.'.
exe.

include 'd:\temp\callmagic.sps'.

 

 

如何在spss中做对应分析

数据格式转化

VARSTOCASES

/ID=[attr]

/MAKE [freq] FROM [var00001 var00002 var00003 var00004 var00005 var00006]

/INDEX = [brand(6)]

/KEEP =

/NULL = KEEP.

 

加权

 

WEIGHT BY [freq].

 

对应分析

CORRESPONDENCE TABLE =[Q1(1 4)] BY [Q3(1 5)]

/DIMENSIONS = 2

/MEASURE = CHISQ

/STANDARDIZE = RCMEAN

/NORMALIZATION = SYMMETRICAL

/PRINT = TABLE RPOINTS CPOINTS

/PLOT = NDIM(1,MAX) BIPLOT(20) . 其中[]内为参数,根据不同数据进行修改

 


missing value

      PROBLEM:
            When using IF commands, if I try to assign a missing value to a numeric variable by using a period (".") as in the following incorrect command syntax,
                  IF (OLDVAR = 9) NEWVAR=. . SPSS sees the period as the command terminator, and the missing value doesn't get assigned appropriately. Also, if I try to recode an existing numeric missing value into something else using a period (".") as in the following incorrect command syntax,
                  IF (OLDVAR = .) NEWVAR = 4. that doesn't work either. SOLUTION:
            Numeric Variables
                  Apparently, SPSS doesn't use the period (".") as an actual numeric missing value, only for display (output) purposes. To create a new numeric missing value, use "$sysmis" after the equal sign, as in this example:
                        IF (OLDVAR = 9) NEWVAR = $SYSMIS. To use a numeric missing value in an evaluation expression, use the MISSING(arg) function, as in this example:
                        IF MISSING(OLDVAR) NEWVAR = 4. Character (String/Alphanumeric) Variables
                  Missing values in string variables are handled in a more straightforward manner. To create a new string missing value, use quotation marks (' ', or " ") after the equal sign, as in this example:
                        IF (OLDVAR = 'X') NEWVAR = " ". To use a string missing value in an evaluation expression, use quotes similar to the above, as in this example:
                        IF (OLDVAR=' ') NEWVAR = "XYZ".

  评论这张
 
阅读(1253)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017