原始代码:截断255个字符(根据RODBC文档)
库(RODBC)
con_string< - odbcConnect(“DSN”)
query_string< - “SELECT text_var FROM table_name”
dat< - sqlQuery(con_string,query_string,stringsAsFactors = FALSE)
库(RODBC)
con_string< - odbcConnect(“DSN”)
query_string< - “SELECT [text_var] = CAST(text_var AS VARCHAR(8000))FROM table_name”
dat< - sqlQuery(con_string,stringsAsFactors = FALSE)
表/变量包含长达250,000个字符的文本字符串.我真的想和R中的所有文本一起工作.这可能吗?
@BrianRipley讨论了以下文档第18页的问题(但没有解决方案):
https://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf
@nutterb在GitHub上讨论了与RODBCext包类似的问题:
https://github.com/zozlak/RODBCext/issues/6
已经看过关于SO的类似讨论,但没有使用RODBC和VARCHAR> 8000的解决方案.
RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX)
RODBC string getting truncated
注意:
> R 3.3.2
> Microsoft sql Server 2012
> Linux RHEL 7.1
>用于sql Server的Microsoft ODBC驱动程序
解决方法
我倾向于使用存储过程来解决这个问题,但这通常需要为每个特定实例编写存储过程.在某些时候,我可能会想出一种在存储过程中更通用地执行此操作的方法,但我发现在存储过程中构造查询的过程是乏味且令人沮丧的.
出于这个原因,我只花了一些时间构建一个函数,该函数将执行涉及VARCHAR(MAX)变量的有限查询.这是一种蛮力的方法,对于一个17000个字符的变量将它导出为三个变量并将它们粘贴在一起.它很粗糙,可能不是很有效,但是我提出的最好的解决方案.
另一个限制是它不允许您重命名查询中的变量.你将被困在变量中,因为它们在数据库中被命名.如果您只涉及几张表,那可能不是问题.在非常复杂的数据库中,这可能会有问题.但是,至少有了这个,您可以使用一些必要的ID来查询VARCHAR(MAX)变量,在R中执行合并.
正如GitHub问题中所讨论的那样,最好尽量避免使用VARCHAR(MAX).如果确实需要未知长度,则VARBINARY(MAX)更容易查询.
例
channel <- odbcDriverConnect(...) query_varchar_max(channel = channel,id = c("idvar"),varchar_max = c("varchar_max_var","varchar_max_var2"),from = "FROM dbo.table_name WHERE group = ?",data = list(group = "A"))
#' @name query_varchar_max #' @title Query a VARCHAR(MAX) Variable from sql Server #' #' @description The RODBC driver to sql Server (sql Server Native Client 11.0) #' reports the lenght of a VARCHAR(MAX) variable to be zero. This presents #' difficulties in extracting long text values from the database. Often,the #' ODBC will assume a length of 255 characters and truncate the text to that #' many characters. The approach taken here searches the VARCHAR(MAX) variables #' for the longest length,and extracts the data in segments to be pasted #' together in R. #' #' @param channel A valid ODBC channel to a sql Server database. #' @param id A character vector of ID variables that may be used to merge the #' data from this query into another dataset. #' @param varchar_max a character vector of variable names that are to be #' treated as if they are VARCHAR(MAX) variables. #' @param from A single character string providing the remainder of the query #' to be run,beginning with the \code{FROM} statement. #' @param stringsAsFactors \code{logical(1)}. Should character strings returned #' from the database be converted to factors? #' @param ... Additional arguments to \code{sqlExecute} when running the full #' query. #' #' @details \code{query_varchar_max} operates by determining how many columns of up to #' 8000 characters each are required to export a complete VARCHAR(MAX) variable. #' It then creates the necessary number of intermediate variables and queries the #' data using the sql Server \code{SUBSTRING} command,extracting the VARCHAR(MAX) #' variable in increments of 8000 characters. After completing the query,#' the intemediary variables are concatenated and removed from the data. #' #' The function makes accommodation for multi-part queries as far as [TABLE].[VARIABLE] #' formats are concerned. It is not intended for use in [SCHEMA].[TABLE].[VARIABLE] #' formats. This at least allows \code{from} to include joins for more complex #' queries. Parameterized queries are also supported through \code{sqlExecute}. #' #' @export query_varchar_max <- function(channel,id,varchar_max,from,stringsAsFactors = FALSE,...) { coll <- checkmate::makeAssertCollection() checkmate::assert_class(x = channel,classes = "RODBC",add = coll) checkmate::assert_character(x = id,add = coll) checkmate::assert_character(x = varchar_max,add = coll) checkmate::assert_character(x = from,len = 1,add = coll) checkmate::assert_logical(x = stringsAsFactors,add = coll) checkmate::reportAssertions(coll) varchar_max_len <- paste0( sprintf("MAX(LEN(%s)) AS len_%s",sub("[.]","_",varchar_max)),collapse = "," ) varchar_len <- unlist( RODBCext::sqlExecute( channel = channel,query = sprintf("SELECT %s %s",varchar_max_len,from),fetch = TRUE ) ) varchar_max_cols <- unlist( mapply(expand_varchar_max,varchar_len,SIMPLIFY = FALSE) ) Prelim <- RODBCext::sqlExecute( channel = channel,query = sprintf("SELECT %s,%s %s",paste0(id,"),paste0(varchar_max_cols,fetch = TRUE,stringsAsFactors = stringsAsFactors,... ) var_stub_to_combine <- unique( sub( "(part)(\\d{1,3})","\\1",sub(".+AS ","",varchar_max_cols) ) ) col_to_combine <- lapply(var_stub_to_combine,grep,names(Prelim)) Prelim[sub(".+[.]",varchar_max)] <- lapply(col_to_combine,function(col) apply(Prelim[col],1,paste0,collapse = "")) Prelim[-unlist(col_to_combine)] } expand_varchar_max <- function(varchar_max,varchar_len) { nvar <- varchar_len %/% 8000 + 1 var_list <- vector("character",length = nvar) for (i in seq_along(var_list)) { var_list[i] <- sprintf("SUBSTRING(%s,%s,%s) AS %s_part%s",1 + (i - 1) * 8000,8000,paste0(sub("[.]",i) } var_list }