Our ninth Advent Calendar door reveals: The way to our own Elasticsearch Plugin – Part one. Inpsyder Christian Brückner, technical lead developer of this project, is briefly going to explain what is Elasticsearch, why we started to build our own Elasticsearch plugin, and how we tackled the problem.
Table of Contents:
What is Elasticsearch?
Why building an own Plugin?
Building a Team
1. Know your Tools
2. Review Plugins
3. Review PHP-Packages
The next Meeting
1. Review of existing Plugins
c. JetPack Search
d. Algolia for WP
2. Review of existing PHP-packages
a. Elasticsearch PHP (official)
c. WPES Lib
3. Decisions decisions decisions
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
In the past we used a lot the existing solutions like ElasticPress from 10up or Jetpack Search from Automattic on WordPress VIP-Hosting.
These solutions were good out of the box, but as soon as we needed to customize searches or tried to enhance how the data is stored in Elasticsearch for our customer projects, we were highly dissatisfied and ended in writing a ton of code which deactivated features or build around the plugin. Also the support of a modern tech stack and usage of latest technologies like Composer, PHP 7 or Elasticsearch 6 (since November 2017) were not given.
We absolutely had to take the step forward and write our own solutions which was flexible and easy expandable in future without locking us to other companies to rely on continuous development of the plugin.
At Inpsyde we currently have three customer teams which are working only on customer projects and a product team which is working on products like MultilingualPress, BackWPup or PayPal Plus.
To build a company wide solution for our customers, we decided to create a small team out of three developers from different teams with one project manager as Product Owner to lead the communication flow and prioritize our goals.
This cross-team collaboration happens from time to time to collect all requirements of each team. Our goal is always not to build isolated solutions, instead we want and need everyone on board.
After asking for some time and budget, we finally got a team with some background knowledge about Elasticsearch and of the well-known plugins:
- Sebastian Pajor – Project Manager and Product owner
- Christian Brückner – Technical Lead Developer
- David Naber – Developer
- Cristiano Baptista – Developer
Besides that, we had some side-kicks from all teams, which reviewed our concept and did code reviews later.
Before we started, we collected requirements from all developer teams. These requirements were presented in our Kick-Off meeting and prioritized:
- No magic – the plugin should nothing do out of the box
- Easy to…
- … extend
- … configure
- … create new indices
- … synchronize data
- … debug (logging)
- Multisite support
- Possibility to push a dataSource (e.G. WP_Post) to multiple indices.
- Async handling of data sync via Queue or external Message Handler like RabbitMQ
- WP CLI support
- Supporting latest version of Elasticsearch (Version 6 at this time)
- Very good unit test coverage
Furthermore, I decided to create a concept paper in Google Drive to collect all information we need, create an own Slack channel for faster communication and until our next meeting everyone got some homework to do:
Since Elasticsearch is not our main business, we had to define a bottom line of knowledge. The goal was to read through the whole documentation, create a glossary and ask if something is unclear. This was important for further steps, so that everybody knows what we need to integrate:
- Read the documentation https://www.elastic.co/guide/en/elasticsearch/reference/6.x/index.html
- Write together a glossary
- Write down what we have to support: Cluster, Nodes, Shard, Indices, Types, Mapping, Documents, Replicas and describe it in a few sentences.
- Ask and discuss about everything!
I decided to split up the following plugins, so that everyone has to create a short audit, review and conclusion:
- JetPack Search
- Algolia for WP
The results were presented by each team member in the next meeting. This step is very important, because we have to know the strengths and weaknesses of existing plugins and what we can learn out of it.
Since we’re using composer as dependency management, we had to decide, if it is possible to re-use existing solutions to talk with the Elasticsearch API or if we have to write the connection to the API from scratch. Therefore … more homework for everyone: Each team member had to review one of the following packages:
- ElasticSearch PHP – the official PHP package from Elasticsearch.
- Elastica – a higher level wrapper around the client.
- WPES Lib – extracted automatic package from JetPack Search.
After some time has passed, we scheduled a second meeting to talk about our results and researches. Since every review is at least 1,5 pages long, I’ll try to summarize what we found out:
All listed plugin solutions are providing “push data to Elastic” and “change default WP_Query” out of the box which is maybe helpful for different sites and scenarios, but will be hard when having a lot of custom data and rules for searching through the data. Also, all plugins are only supporting WP_Post as starting point. There is not always support out of the box for WP_User, WP_Comment or WP_Term to push into an own index to Elasticsearch.
The auto-parsing of WP_Query into an Elasticsearch-API-call is not always 100% correct in terms of “transforming”. This may mess up the search with wrong results.
All plugins are misusing Elasticsearch as a kind of “Caching”-Layer to MySQL to argue, that performance is increased. Yes performance is increased, when having a slow misconfigured database without caching, but the main goal is not “caching results”, it is “searching”.
To keep in mind, the main difference of RDBMS and ElasticSearch is:
RDBMS answers the question: Which are the documents that matches perfectly the values inserted in the query?
Search engine (as Lucene) answer the question: Which are the most similar documents to the query ordered by a relevance score index ?
Here’s a short overview of our review:
ElasticPress is a well known and widely used solution. 10up also provides an own hosted Elasticsearch https://www.elasticpress.io/.
When activated and credentials are configured, the Plugin pushes automatically into Elasticsearch and replaces the WP_Query.
We used and customized this plugin in the past a lot. We even added issues to their repository. And this is exactly the starting point, why we’re writing an own plugin for this. There is no “progress” anymore for us. We’re always ending in writing around problems.
The plugin is still on PHP 5.2 level with a lot of spaghetti code (e.G. ep_wc_translate_args() has a cyclomatic complexity of 63, npath complexity of 26203564800 and length of 328 lines), has no logging integration, has no (official) support for the latest Elasticsearch (06/2018) and is very hard to extend.
SearchPress from alleyinteractive is currently not actively maintained, last commit was in 03/2018 when we reviewed it in summer, which was the only commit in this year.
The Plugin itself uses a lot of globals, is based on PHP 5.3 and has some questionable concepts which can break things by e.g. changing the mapping. The plugin does automatically create a mapping pushes/syncs Post-data to Elasticsearch. In WordPress it replaces the WP_Query & tries to load results from Elasticsearch.
JetPack Search is also widely used in combination with JetPack and a recommended and pre-installed solution when using WordPress VIP or WordPress VIP Go hosting.
The main problem with this plugin is, that there is still no support of the latest Elasticsearch version and you need to install the whole JetPack-ecosystem with “Jetpack Professional plan” to just use the search module. The code is really WordPressy out of the box, but does at least allow to debug with some well-known tools like Debug Bar and Query Monitor. Despite that, there are some well-thought features and the schema does cover a lot of use cases, but we don’t want to have to rely on the huge JetPack-ecosystem and automatic replacing WP_Query by the plugin.
The Plugin provides access to https://www.algolia.com/ which is a kind of hosted Elasticsearch with restricted features. Configuration happens via backend settings page and requires an Algolia account. Data will be automatically pushed to Algolia, but cannot be changed. Also the service itself does cost a monthly fee. Out of the box the plugin provides autocomplete for WordPress search in frontend and some custom scripts.
We’ve skipped a in depth review, because this plugin would restrict us too much and we would always have to rely on the external service. At least the concepts and implementation was good to see and know for later work.
We decided to stick with the official “elasticsearch php package” which provides complete support of the API, modern PHP version and continuous development in feature. Since we’re going to abstract the API in our plugin, we’re future safe and can, if wanted, easily replace the package.
Here’s a short overview of our review:
This is the official PHP package which supports the complete API to Elasticsearch. It is actively developed, requires PHP >= 7, has composer support and is a kind of low level abstraction of the API with easy configuration of the client. It also supports logging via the PSR-3 LoggerInterface https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-3-logger-interface.md out of the box.
This package is a higher level wrapper around official client, is actively developed, and depends on the official elasticsearch PHP package.
The package provides a simpler usable API which is pretty easy to use, but the further development is opinionated and may hinder us by relying too much on specific implementations.
This package from Automattic is a minimalistic approach to talk with Elasticsearch. Sadly it has no composer- and logging-support build-in and does not cover the complete API. Also it lacks of Unit Tests and is written in the “old PHP-way” as WordPress. Last but not least, the documentation mentioned, that “probably Elasticsearch 5.x” is supported, which is October 2016.
After our conclusion and reviewing all results, we finally made some decisions for our next steps:
- We’re talking the same language when talking about Elasticsearch.
- We know existing solutions with their strengths and weaknesses.
- We’re going to use the official “elasticsearch-php”-package to talk with the API.
- We need a name for the baby.
- We’re going to define modules of our new plugin to split work.
Stay tuned, in my next blog post I’ll give you some insights about the implementation and basic concepts we’ve followed!