问题描述
我正在尝试使用队列从 Tensorflow 中的文件加载数据.
I am trying to make use of queues for loading data from files in Tensorflow.
我想在每个 epoch 结束时使用验证数据运行图表,以便更好地了解训练的进展情况.
I would like to to run the graph with validation data at the end of each epoch to get a better feel for how the training is going.
这就是我遇到问题的地方.我似乎无法弄清楚如何使用队列时在训练数据和验证数据之间进行切换.
That is where i am running into problems. I cant seem to figure out how to make the switch between training data and validation data when using queues.
我已将我的代码精简为一个最小的玩具示例,以便更容易得到帮助.我没有包含加载图像文件、执行推理和训练的所有代码,而是在文件名加载到队列中的位置.
I have stripped down my code to a bare minimum toy example to make it easier to get help. Instead of including all the code that loads the image files, performs inference, and training, I have chopped it off at the point where the filenames are loaded into the queue.
import tensorflow as tf
# DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]
# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2
# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
file_list = tf.placeholder(dtype=tf.string, shape=None)
# Create a queue consisting of the strings in `file_list`
q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)
# Create batch of items.
x = q.dequeue_many(batch_size)
# Inference, train op, and accuracy calculation after this point
# ...
# RUN SESSION
with tf.Session(graph=graph) as sess:
# Initialize variables
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# Start populating the queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
for epoch in range(epochs):
print("-"*60)
for step in range(batches_per_epoch):
if coord.should_stop():
break
train_batch = sess.run(x, feed_dict={file_list: train_items})
print("TRAIN_BATCH: {}".format(train_batch))
valid_batch = sess.run(x, feed_dict={file_list: valid_items})
print("
VALID_BATCH : {}
".format(valid_batch))
except Exception, e:
coord.request_stop(e)
finally:
coord.request_stop()
coord.join(threads)
变化和实验
为 num_epochs
尝试不同的值num_epochs=无
如果我将 tf.train.string_input_producer()
中的 num_epochs
参数设置为None
它给出以下输出,这表明它正在按预期运行两个时期,但它正在使用数据运行评估时从训练集中获取.
Variations and experiments
Trying different values for num_epochs
num_epochs=None
If i set the num_epochs
argument in tf.train.string_input_producer()
to
None
it gives be the following output,
which shows that it is running two epochs as intended, but it is using data
from the training set when running evaluation.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']
num_epochs=2
如果我将 tf.train.string_input_producer()
中的 num_epochs
参数设置为 2
它给出了以下输出,这表明它甚至根本没有运行完整的两个批次(并且评估仍在使用训练数据)
num_epochs=2
If i set the num_epochs
argument in tf.train.string_input_producer()
to 2
it gives be the following output,
which shows that it is not even running the full two batches at all
(and evaliation is still using training data)
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
num_epochs=1
如果我将 tf.train.string_input_producer()
中的 num_epochs
参数设置为 1
希望它会被冲走队列中的任何其他训练数据,以便它可以利用验证数据,我得到以下输出,这表明它正在终止它通过了一个时期的训练数据,并且没有通过加载评估数据.
num_epochs=1
If i set the num_epochs
argument in tf.train.string_input_producer()
to 1
in the hopes that it will flush out
any aditional training data from the queue so it can make use of the validation
data, i get the following output, which shows that it is terminating as soon as
it gets through one epoch of training data, and does not get to go through
loading evaluation data.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
将 capacity
参数设置为各种值
我也试过设置 capacity
参数tf.train.string_input_producer()
到小的值,例如 3 和 1.但是这些对结果没有影响.
Setting capacity
argument to various values
I have also tried setting the capacity
argument in
tf.train.string_input_producer()
to small values, such as 3, and 1. But these
had no effect on the results.
我还可以采取哪些其他方法在训练数据和验证数据之间切换?我必须创建单独的队列吗?我不知道如何做到这一点工作.我是否还必须创建额外的协调器和队列运行器?
What other approach could i take to switch between training and validation data? Would i have to create separate queues? I am at a loss as to how to get that to work. Would i have to create additional coordinators and queue runners as well?
推荐答案
我在这里整理了一份可能解决此问题的潜在方法列表.其中大部分只是模糊的建议,没有实际的代码示例来展示如何使用它们.
I am compiling a list of potential approaches that might solve this issue here. Most of these are just vague suggestions, with no actual code examples to show how to make use of them.
建议这里
建议这里
sygi 在这个 stackoverflow 线程上也提出了建议.链接
Also suggested by sygi on this very stackoverflow thread. link
建议这里
建议这里和这里
由 sygi 在这个 stackoverflow 线程中建议(链接).这可能与 make_template() 方法相同.
suggested by sygi in this very stackoverflow thread (link). This might be the same as make_template() method.
建议 这里 带有示例代码 这里在这个线程上适应我的问题的代码.链接
Suggested here with sample code here Code adapted to my problem here on this thread. link
建议这里
这篇关于Tensorflow 队列 - 在训练数据和验证数据之间切换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!