问题描述
我正在体验使用 .NET NEST 客户端和 ElasticSearch 的批量索引性能会随着时间的推移而降低,索引数量和文档数量都是恒定的.
I am experiencing that bulk indexing performance using the .NET NEST client and ElasticSearch degrades over time with a constant amount of indexes and number of documents.
我们正在使用 Ubuntu Server 12.04.1 LTS 64 位和 Sun Java 7 的 m1.large Amazon 实例上运行 ElasticSearch 版本:0.19.11,JVM:23.5-b02
.没有别的了在这个实例上运行,除了 Ubuntu 安装附带的.
We are running ElasticSearch Version: 0.19.11, JVM: 23.5-b02
on a m1.large Amazon instance with Ubuntu Server 12.04.1 LTS 64 bit and Sun Java 7. There is nothing else running on this instance except what comes along with the Ubuntu install.
Amazon M1 大型实例:来自 http://aws.amazon.com/ec2/instance-types/
7.5 GiB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
EBS-Optimized Available: 500 Mbps
API name: m1.large
ES_MAX_MEM 设置为 4g,ES_MIN_MEM 设置为 2g
ES_MAX_MEM is set to 4g and ES_MIN_MEM is set to 2g
每天晚上,我们在 .NET 应用程序中使用 NEST 索引/重新索引约 15000 个文档.在任何给定时间,只有一个索引包含 <= 15000 个文档.
Every night we index/reindex ~15000 documents using NEST in our .NET application. At any given time there is only one index with <= 15000 documents.
当服务器首次安装时,最初几天的索引和搜索速度很快,然后索引开始变得越来越慢.批量索引一次索引 100 个文档,一段时间后,完成批量操作最多需要 15 秒.在那之后,我们开始看到很多以下异常,并且索引停止了.
when the server was first installed the indexing and search was fast for the first couple of days, then indexing started to get slower and slower. the bulk indexing indexes 100 documents at a time and after a while it would take up to 15s for a bulk operation to finish. after that we started to see alot of the following exception and the indexing grinding to a halt.
System.Net.WebException: The request was aborted: The request was canceled.
at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization) :
构建索引实现如下所示
private ElasticClient GetElasticClient()
{
var setting = new ConnectionSettings(ConfigurationManager.AppSettings["elasticSearchHost"], 9200);
setting.SetDefaultIndex("products");
var elastic = new ElasticClient(setting);
return elastic;
}
private void DisableRefreshInterval()
{
var elasticClient = GetElasticClient();
var s = elasticClient.GetIndexSettings("products");
var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings();
settings["refresh_interval"] = "-1";
var result = elasticClient.UpdateSettings(settings);
if (!result.OK)
_logger.Warn("unable to set refresh_interval to -1, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage);
}
private void EnableRefreshInterval()
{
var elasticClient = GetElasticClient();
var s = elasticClient.GetIndexSettings("products");
var settings = s != null && s.Settings != null ? s.Settings : new IndexSettings();
settings["refresh_interval"] = "1s";
var result = elasticClient.UpdateSettings(settings);
if (!result.OK)
_logger.Warn("unable to set refresh_interval to 1s, {0}", result.ConnectionStatus == null || result.ConnectionStatus.Error == null ? "" : result.ConnectionStatus.Error.ExceptionMessage);
}
public void Index(IEnumerable<Product> products)
{
var enumerable = products as Product[] ?? products.ToArray();
var elasticClient = GetElasticClient();
try
{
DisableRefreshInterval();
_logger.Info("Indexing {0} products", enumerable.Count());
var status = elasticClient.IndexMany(enumerable as IEnumerable<Product>, "products");
if (status.Items != null)
_logger.Info("Done, Indexing {0} products, duration: {1}", status.Items.Count(), status.Took);
if (status.ConnectionStatus.Error != null)
{
_logger.Error(status.ConnectionStatus.Error.OriginalException);
}
}
catch(Exception ex)
{
_logger.Error(ex);
}
finally
{
EnableRefreshInterval();
}
}
重新启动 elasticsearch 守护进程似乎没有任何区别,但删除索引并重新索引所有内容.但几天后,我们会遇到同样的索引缓慢问题.
Restarting the elasticsearch daemon does not seem to make any difference whatsoever, but deleting the index and re-indexing everything does. But after a few days we would have the same slow-indexing problem.
我刚刚删除了索引,并在每次批量索引操作后重新启用刷新间隔后添加了优化,希望这样可以防止索引降级.
I just deleted the index and added an Optimize after the re-enabling of the refresh interval after each bulk-index operation in the hope that this might keep the index from degrading.
...
...
finally
{
EnableRefreshInterval();
elasticClient.Optimize("products");
}
我在这里做错了什么吗?
Am I doing something horribly wrong here?
推荐答案
抱歉 - 刚开始写另一个很长的评论,我想我会把它全部写在答案中,以防其他人受益......
Sorry - just started writing another quite long comment and thought I'd just stick it all in an answer in case it benefits someone else...
ES_HEAP_SIZE
我在这里注意到的第一件事是您说您将 elasticsearch 的最大和最小堆值设置为不同的值.这些应该是一样的.在配置/init.d 脚本中应该有一个可以设置的 EX_HEAP_SIZE.请务必仅设置此值(而不是最小值和最大值),因为它将最小值和最大值设置为您想要的相同值.如果你不这样做,当你开始需要更多内存时,JVM 会阻塞 java 进程 - 看看这个很棒文章 最近在 github 上发生了一次中断(这里引用了一段话):
The first thing I noticed here is that you said you set the max and min heap values for elasticsearch to different values. These should be the same. In the configuration / init.d script there should be an EX_HEAP_SIZE that you can set. Be sure to only set this (and not the min and max values) as it will set the min and max values to the same value which is what you want. If you don't do this the JVM will block java processes when you start to need more memory - see this great article of an outage at github very recently (here's a quote):
设置 ES_HEAP_SIZE 环境变量,以便 JVM 对最小和最大内存使用相同的值.将 JVM 配置为具有不同的最小值和最大值意味着每次 JVM 需要额外的内存(达到最大值)时,它都会阻塞 Java 进程来分配它.结合旧的 Java 版本,这解释了我们的节点在向公共搜索开放时引入更高负载和连续内存分配时表现出的暂停.elasticsearch 团队建议设置为系统 RAM 的 50%.
Set the ES_HEAP_SIZE environment variable so that the JVM uses the same value for minimum and maximum memory. Configuring the JVM to have different minimum and maximum values means that each time the JVM needs additional memory (up to the maximum), it will block the Java process to allocate it. Combined with the old Java version, this explains the pauses that our nodes exhibited when introduced to higher load and continuous memory allocation when they were opened up to public searches. The elasticsearch team recommends a setting of 50% of system RAM.
还可以查看 这篇很棒的帖子,了解更多 elasticsearch 配置战壕.
Also check out this great post for more elasticsearch config from the trenches.
锁定内存以停止交换
根据我的研究,我发现您还应该锁定 Java 进程可用的内存量以避免内存交换.我不是这个领域的专家,但有人告诉我这也会影响性能.您可以在 elasticsearch.yml 配置文件中找到 bootstrap.mlockall.
From my research I've found that you should also lock the amount of memory available to the java process to avoid memory swapping. I'm no expert in this field but what I've been told is that this will also kill performance. You can find bootstrap.mlockall in your elasticsearch.yml config file.
升级
Elasticsearch 还是很新的.计划相当频繁地升级,因为在您使用的版本 (0.19.11) 和当前版本 (0.20.4) 之间引入的错误修复非常重要.有关详细信息,请参阅 ES 站点.您使用的是 Java 7,这绝对是正确的方法,我从 Java 6 开始并很快意识到它还不够好,尤其是对于批量插入.
Elasticsearch is still quite new. Plan to upgrade fairly frequently as the bug fixes that have been introduced between the version you were on (0.19.11) and the current version (0.20.4) are very significant. See the ES site for details. You're on Java 7 which is definitely the right way to go, I started on Java 6 and realized quickly that it just wasn't good enough, especially for bulk inserting.
插件
最后,对于遇到类似问题的其他人,请安装一个体面的插件,以概览您的节点和 JVM.我推荐 bigdesk - 运行 bigdesk 然后使用一些批量插入来点击 elasticsearch 并注意奇怪的堆内存模式,一个非常大线程数等,都在那里!
Finally, to anyone else who experiences similar issues, get a decent plugin installed for an overview of your nodes and the JVM. I recommend bigdesk - run bigdesk and then hit elasticsearch with some bulk inserts and watch out for strange heap memory patterns, a very large number of threads etc, it's all there!
希望有人觉得这很有用!
Hope someone finds this useful!
干杯,詹姆斯
这篇关于随着时间的推移,随着索引和文档数量的增加,elasticsearch 批量索引会变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!