cursor.execute("SELECT id,name FROM `table`") for i in xrange(cursor.rowcount): id,name = cursor.fetchone() print id,name
有一个可选的游标,一次只能获取一行,真正加快了脚本的速度,并大大减少了脚本的内存占用.
import MysqLdb import MysqLdb.cursors conn = MysqLdb.connect(user="user",passwd="password",db="dbname",cursorclass = MysqLdb.cursors.SSCursor) cur = conn.cursor() cur.execute("SELECT id,name FROM users") row = cur.fetchone() while row is not None: doSomething() row = cur.fetchone() cur.close() conn.close()
但我找不到任何关于使用SSCursor与嵌套查询的内容.如果这是doSomething()的定义:
def doSomething() cur2 = conn.cursor() cur2.execute('select id,x,y from table2') rows = cur2.fetchall() for row in rows: doSomethingElse(row) cur2.close()
然后脚本抛出以下错误:
_MysqL_exceptions.ProgrammingError: (2014,"Commands out of sync; you can't run this command now")
听起来好像SSCursor与嵌套查询不兼容.真的吗?如果是这样太糟糕了,因为主循环似乎用标准光标运行得太慢了.
解决方法
threadsafety
attribute(强调我的)标题下讨论了这个问题:
The MysqL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MysqLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
MysqL_store_result()
),it is complicated by SSCursor (which uses
MysqL_use_result()
; with the latter you must ensure all the rows have
been read before another query can be executed.
MySLQ C API函数mysql_use_result()
的文档提供了有关错误消息的更多信息:
When using
MysqL_use_result()
,you must executeMysqL_fetch_row()
until aNULL
value is returned,otherwise,the unfetched rows are
returned as part of the result set for your next query. The C API
gives the errorCommands out of sync; you can't run this command now
if you forget to do this!
换句话说,在通过同一连接执行另一个语句之前,必须从任何未缓冲的游标(即使用MysqL_use_result()而不是MysqL_store_result() – 使用MysqLdb,这意味着SSCursor和SSDictCursor)完全获取结果集.
在您的情况下,最直接的解决方案是在迭代未缓冲查询的结果集时打开第二个连接. (从同一连接中简单地获取缓冲光标是不行的;在使用缓冲光标之前,您仍然需要超过无缓冲的结果集.)
如果您的工作流程类似于“遍历大结果集,对每行执行N个小查询”,请考虑将MysqL的存储过程作为嵌套来自不同连接的游标的替代方法.您仍然可以使用MysqLdb来调用该过程并获得结果,但是您肯定想要read the documentation of MySQLdb’s callproc()
method,因为它在检索过程输出时不符合Python的database API specs.
第二种方法是坚持使用缓冲光标,但将查询拆分为批量.这就是我去年为一个项目做的事情,我需要遍历一组数百万行,用内部模块解析一些数据,并在处理完每一行后执行一些INSERT和UPDATE查询.一般的想法看起来像这样:
QUERY = r"SELECT id,name FROM `table` WHERE id BETWEEN %s and %s;" BATCH_SIZE = 5000 i = 0 while True: cursor.execute(QUERY,(i + 1,i + BATCH_SIZE)) result = cursor.fetchall() # If there's no possibility of a gap as large as BATCH_SIZE in your table ids,# you can test to break out of the loop like this (otherwise,adjust accordingly): if not result: break for row in result: doSomething() i += BATCH_SIZE
关于你的示例代码,我要注意的另一件事是你可以直接在MysqLdb中的游标上迭代而不是在xrange(cursor.rowcount)上显式调用fetchone().这在使用无缓冲游标时尤其重要,因为rowcount属性未定义并且会产生非常意外的结果(请参阅:Python MysqlDB using cursor.rowcount with SSDictCursor returning wrong count).