使用python通过FTP下载大文件

Download big files via FTP with python(使用python通过FTP下载大文件)
本文介绍了使用python通过FTP下载大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我尝试每天从我的服务器下载一个备份文件到我的本地存储服务器,但我遇到了一些问题.

Im trying to download daily a backup file from my server to my local storage server, but i got some problems.

我写了这段代码(去掉了无用的部分,作为电子邮件功能):

I wrote this code (removed the useless parts, as the email function):

import os
from time import strftime
from ftplib import FTP
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email import Encoders

day = strftime("%d")
today = strftime("%d-%m-%Y")

link = FTP(ftphost)
link.login(passwd = ftp_pass, user = ftp_user)
link.cwd(file_path)
link.retrbinary('RETR ' + file_name, open('/var/backups/backup-%s.tgz' % today, 'wb').write)
link.delete(file_name) #delete the file from online server
link.close()
mail(user_mail, "Download database %s" % today, "Database sucessfully downloaded: %s" % file_name)
exit()

我使用 crontab 运行它,例如:

And i run this with a crontab like:

40    23    *    *    *    python /usr/bin/backup-transfer.py >> /var/log/backup-transfer.log 2>&1

它适用于小文件,但它会冻结备份文件(大约 1.7Gb),下载的文件大约 1.2Gb 然后永远不会增长(我等了大约一天),并且日志文件是空的.

It works with small files, but with the backups files (about 1.7Gb) it freeze, the downloaded file get about 1.2Gb then never grows up (i waited about a day), and the log file is empty.

有什么想法吗?

ps:我使用的是 Python 2.6.5

p.s: im using Python 2.6.5

推荐答案

对不起,如果我回答了我自己的问题,但我找到了解决方案.

Sorry if i answer my own question, but I found the solution.

我尝试了 ftputil 没有成功,所以我尝试了很多方法,最后,这行得通:

I tryed ftputil with no success, so i tryed many way and finally, this works:

def ftp_connect(path):
    link = FTP(host = 'example.com', timeout = 5) #Keep low timeout
    link.login(passwd = 'ftppass', user = 'ftpuser')
    debug("%s - Connected to FTP" % strftime("%d-%m-%Y %H.%M"))
    link.cwd(path)
    return link

downloaded = open('/local/path/to/file.tgz', 'wb')

def debug(txt):
    print txt

link = ftp_connect(path)
file_size = link.size(filename)

max_attempts = 5 #I dont want death loops.

while file_size != downloaded.tell():
    try:
        debug("%s while > try, run retrbinary
" % strftime("%d-%m-%Y %H.%M"))
        if downloaded.tell() != 0:
            link.retrbinary('RETR ' + filename, downloaded.write, downloaded.tell())
        else:
            link.retrbinary('RETR ' + filename, downloaded.write)
    except Exception as myerror:
        if max_attempts != 0:
            debug("%s while > except, something going wrong: %s
 	file lenght is: %i > %i
" %
                (strftime("%d-%m-%Y %H.%M"), myerror, file_size, downloaded.tell())
            )
            link = ftp_connect(path)
            max_attempts -= 1
        else:
            break
debug("Done with file, attempt to download m5dsum")
[...]

在我的日志文件中我发现:

In my log file i found:

01-12-2011 23.30 - Connected to FTP
01-12-2011 23.30 while > try, run retrbinary
02-12-2011 00.31 while > except, something going wrong: timed out
    file lenght is: 1754695793 > 1754695793
02-12-2011 00.31 - Connected to FTP
Done with file, attempt to download m5dsum

遗憾的是,即使文件已完全下载,我也必须重新连接到 FTP,这在我的 cas 中不是问题,因为我也必须下载 md5sum.

Sadly, i have to reconnect to FTP even if the file has been fully downloaded, that in my cas is not a problem, becose i have to download the md5sum too.

如您所见,我无法检测到超时并重试连接,但是当我超时时,我只是重新连接;如果有人知道如何在不创建新的 ftplib.FTP 实例的情况下重新连接,请告诉我 ;)

As you can see, I'm not been able to detect the timeout and retry the connection, but when i got timeout, I simply reconnect again; If someone know how to reconnect without creating a new ftplib.FTP instance, let me know ;)

这篇关于使用python通过FTP下载大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

patching a class yields quot;AttributeError: Mock object has no attributequot; when accessing instance attributes(修补类会产生“AttributeError:Mock object has no attribute;访问实例属性时)
How to mock lt;ModelClassgt;.query.filter_by() in Flask-SqlAlchemy(如何在 Flask-SqlAlchemy 中模拟 lt;ModelClassgt;.query.filter_by())
FTPLIB error socket.gaierror: [Errno 8] nodename nor servname provided, or not known(FTPLIB 错误 socket.gaierror: [Errno 8] nodename nor servname provided, or not known)
Weird numpy.sum behavior when adding zeros(添加零时奇怪的 numpy.sum 行为)
Why does the #39;int#39; object is not callable error occur when using the sum() function?(为什么在使用 sum() 函数时会出现 int object is not callable 错误?)
How to sum in pandas by unique index in several columns?(如何通过几列中的唯一索引对 pandas 求和?)