How we rebuilt the search architecture for high availability in GitHub Enterprise Server

春Fit吉

2026-03-04 06:27

转载

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

摘要

GitHub Enterprise Server 的搜索功能是平台的核心组成部分，广泛用于搜索栏、问题页面、发布页面等多个场景。过去，管理员在维护或升级搜索索引（专为搜索优化的数据库表）时需格外谨慎，否则可能导致索引损坏或升级受阻。高可用性（HA）设置旨在确保系统部分故障时服务仍能平稳运行，但此前采用的 Elasticsearch 集群模式在跨服务器部署时引

So much of what you interact with on GitHub depends on search—obviously the search bars and filtering experiences like the GitHub Issues page, but it is also the core of the releases page, projects page, the counts for issues and pull requests, and more. Given that search is such a core part of the GitHub platform, we’ve spent the last year making it even more durable. That means, less time spent managing GitHub Enterprise Server, and more time working on what your customers care most about.

In recent years, GitHub Enterprise Server administrators had to be especially careful with search indexes, the special database tables optimized for searching. If they didn’t follow maintenance or upgrade steps in exactly the right order, search indexes could become damaged and need repair, or they might get locked and cause problems during upgrades. Quick context if you’re not running High Availability (HA) setups, they’re designed to keep GitHub Enterprise Server running smoothly even if part of the system fails. You have a primary node that handles all the writes and traffic, and replica nodes that stay in sync and can take over if needed.

Much of this difficulty comes from how previous versions of Elasticsearch, our search database of choice, were integrated. HA GitHub Enterprise Server installations use a leader/follower pattern. The leader (primary server) receives all the writes, updates, and traffic. Followers (replicas) are designed to be read-only. This pattern is deeply ingrained into all of the operations of GitHub Enterprise Server.

This is where Elasticsearch started running into issues. Since it couldn’t support having a primary node and a replica node, GitHub engineering had to create an Elasticsearch cluster across the primary and replica nodes. This made replicating data straightforward and additionally gave some performance benefits, since each node could locally handle search requests.

Unfortunately, the problems of clustering across servers eventually began to outweigh the benefits. For example, at any point Elasticsearch could move a primary shard (responsible for receiving/validating writes) to a replica. If that replica was then taken down for maintenance, GitHub Enterprise Server could end up in a locked state. The replica would wait for Elasticsearch to be healthy before starting up, but Elasticsearch couldn’t become healthy until the replica rejoined.

For a number of GitHub Enterprise Server releases, engineers at GitHub tried to make this mode more stable. We implemented checks to ensure Elasticsearch was in a healthy state, as well as other processes to try and correct drifting states. We went as far as attempting to build a “search mirroring” system that would allow us to move away from the clustered mode. But database replication is incredibly challenging and these efforts needed consistency.

What changed?

After years of work, we’re now able to use Elasticsearch’s Cross Cluster Replication (CCR) feature to support HA GitHub Enterprise.

“But David,” you say, “That’s replication between clusters. How does that help here?”

I’m so glad you asked. With this mode, we’re moving to use several, “single-node” Elasticsearch clusters. Now each Enterprise server instance will operate as independent single node Elasticsearch clusters.

CCR lets us share the index data between nodes in a way that is carefully controlled and natively supported by Elasticsearch. It copies data once it’s been persisted to the Lucene segments (Elasticsearch’s underlying data store). This ensures we’re replicating data that has been durably persisted within the Elasticsearch cluster.

In other words, now that Elasticsearch supports a leader/follower pattern, GitHub Enterprise Server administrators will no longer be left in a state where critical data winds up on read-only nodes.

Under the hood

Elasticsearch has an auto-follow API, but it only applies to indexes created after the policy exists. GitHub Enterprise Server HA installations already have a long-lived set of indexes, so we need a bootstrap step that attaches followers to existing indexes, then enables auto-follow for anything created in the future.

Here’s a sample of what that workflow looks like:

function bootstrap_ccr(primary, replica):
  # Fetch the current indexes on each 
  primary_indexes = list_indexes(primary)
  replica_indexes = list_indexes(replica)

  # Filter out the system indexes
  managed = filter(primary_indexes, is_managed_ghe_index)
  
  # For indexes without follower patterns we need to
  #   initialize that contract
  for index in managed:
    if index not in replica_indexes:
      ensure_follower_index(replica, leader=primary, index=index)
    else:
      ensure_following(replica, leader=primary, index=index)

  # Finally we will setup auto-follower patterns 
  #   so new indexes are automatically followed
  ensure_auto_follow_policy(
    replica,
    leader=primary,
    patterns=[managed_index_patterns],
    exclude=[system_index_patterns]
  )

This is just one of the new workflows we’ve created to enable CCR in GitHub Enterprise Server. We’ve needed to engineer custom workflows for failover, index deletion, and upgrades. Elasticsearch only handles the document replication, and we’re responsible for the rest of the index’s lifecycle.

How to get started with CCR mode

To get started using the new CCR mode, reach out to support@github.com and let them know you’d like to use the new HA mode for GitHub Enterprise Server. They’ll set up your organization so that you can download the required license.

Once you’ve downloaded your new license, you’ll need to set `ghe-config app.elasticsearch.ccr true`. With that finished, administrators can run a `config-apply` or an upgrade on your cluster to move to 3.19.1, which is the first release to support this new architecture.

When your GitHub Enterprise Server restarts, Elasticsearch will migrate your installation to use the new replication method. This will consolidate all the data onto the primary nodes, break clustering across nodes, and restart replication using CCR. This update may take some time depending on the size of your GitHub Enterprise Server instance.

While the new HA method is optional for now, we’ll be making it our default over the next two years. We want to ensure there’s ample time for GitHub Enterprise administrators to get their feedback in, so now is the time to try it out.

We’re excited for you to start using the new HA mode for a more seamless experience managing GitHub Enterprise Server.

Want to get the most out of search on your High Availability GitHub Enterprise Server deployment? Reach out to support to get set up with our new search architecture!

The post How we rebuilt the search architecture for high availability in GitHub Enterprise Server appeared first on The GitHub Blog.

转载信息

原文： How we rebuilt the search architecture for high availability in GitHub Enterprise Server （2026-03-03T18:45:09）

作者： David Tippett 分类：技术

链接： https://github.blog/engineering/architecture-optimization/how-we-rebuilt-the-search-architecture-for-high-availability-in-github-enterprise-server/ ｜声明：转载仅供分享；侵权联系删除。

0 3 192

返回列表

请登录后发表评论

John_Doe_789

2026-04-22 19:00

这篇文章太实用了，对我学习架构设计帮助很大！

lkzpfygtqq

2026-04-19 02:00

我们公司也在搞高可用架构，这篇文章太实用了，收藏了！

Alex123

2026-03-30 06:00

GitHub这个搜索架构改造的案例太实用了！作为正在学习系统架构的学生，这种真实世界的优化经验比教科书上的理论更有价值。不知道他们具体是如何处理数据一致性的问题呢？收藏了，以后做项目参考用。

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

摘要

What changed?

Under the hood

How to get started with CCR mode

转载信息

附件 0

评论 (3)

关于作者

春Fit吉

相关文章

热门标签

How we rebuilt the search architecture for high availability in GitHub Enterprise Server

摘要

What changed?

Under the hood

How to get started with CCR mode

转载信息

附件 0

评论 (3)

关于作者

春Fit吉

相关文章

马来西亚留学生必知的交通法规

马来西亚公立与私立医院差异对比

马来西亚留学需要考雅思、托福吗？

马来西亚留学期间多久回家一次最适宜

如何在中国办理留学生学历认证

马来西亚学历文凭回国还有哪些保障

怎样融入马来西亚留学社交圈

马来西亚留学生打工有加班费吗？

马来西亚留学需要学马来语吗

马来西亚留学生兼职能赚多少钱

留学生在马来西亚的打工范围与实用建议

马来西亚留学冷知识盘点

在马来西亚，留学生如何找到学业与工作的平衡支点

马来西亚留学生医疗保障全指南

在马来西亚留学可以申请哪些奖学金

马来西亚对中国留学生有哪些利好政策

马来西亚留学生如何找到合适的就业岗位

南洋热浪：中国留学生为何纷纷"南下"马来西亚？

高考成绩不理想,如何规划马来西亚留学

马来西亚公立院校与私立大学：优劣势全面对比

热门标签