试图将dplyr用于group_by并应用scale()

尝试在以下数据框中使用d_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_r_变量,如 this SO question所示：

> str(df)
'data.frame':   4136 obs. of  4 variables:
 $stud_ID         : chr  "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
 $behavioral_scale: num  3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
 $cognitive_scale : num  3.5 3 3 3 3.5 2 NA NA 1 1 ...
 $affective_scale : num  2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...

我尝试了以下方式获得学生的比例分数(而不是所有学生观察的比例分数)：

scaled_data <- 
          df %>%
              group_by(stud_ID) %>%
                  mutate(behavioral_scale_ind = scale(behavioral_scale),cognitive_scale_ind = scale(cognitive_scale),affective_scale_ind = scale(affective_scale))

结果如下：

> str(scaled_data)
Classes ‘grouped_df’,‘tbl_df’,‘tbl’ and 'data.frame': 4136 obs. of  7 variables:
 $stud_ID             : chr  "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
 $behavioral_scale    : num  3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
 $cognitive_scale     : num  3.5 3 3 3 3.5 2 NA NA 1 1 ...
 $affective_scale     : num  2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
 $behavioral_scale_ind: num [1:12,1] 0.64 1.174 0.64 0.107 0.64 ...
  ..- attr(*,"scaled:center")= num 2.9
  ..- attr(*,"scaled:scale")= num 0.937
 $cognitive_scale_ind : num [1:12,1] 1.17 0.64 0.64 0.64 1.17 ...
  ..- attr(*,"scaled:center")= num 2.4
  ..- attr(*,"scaled:scale")= num 0.937
 $affective_scale_ind : num [1:12,1] 0 1.28 0.64 0.64 0 ...
  ..- attr(*,"scaled:center")= num 2.5
  ..- attr(*,"scaled:scale")= num 0.782

三个缩放变量(behavioral_scale,cognitive_scale和affective_scale)只有12个观察结果 – 第一个学生,ABB112292的观察次数相同.

这里发生了什么？个人如何获得比例分数？

解决方法

这个问题似乎在基本的scale()函数中,它需要一个矩阵.尝试写自己的.

scale_this <- function(x){
  (x - mean(x,na.rm=TRUE)) / sd(x,na.rm=TRUE)
}

那么这样做：

library("dplyr")
 
# reproducible sample data
set.seed(123)
n = 1000
df <- data.frame(stud_ID = sample(LETTERS,size=n,replace=TRUE),behavioral_scale = runif(n,10),cognitive_scale = runif(n,1,20),affective_scale = runif(n,1) )
scaled_data <- 
  df %>%
  group_by(stud_ID) %>%
  mutate(behavioral_scale_ind = scale_this(behavioral_scale),cognitive_scale_ind = scale_this(cognitive_scale),affective_scale_ind = scale_this(affective_scale))

或者,如果您打开了一个data.table解决方案：

library("data.table")
 
setDT(df)
 
cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
 
df[,lapply(.SD,scale_this),.SDcols = cols_to_scale,keyby = factor(stud_ID)]

试图将dplyr用于group_by并应用scale()

解决方法

猜你在找的CSS相关文章