Elasticsearch 🤝 Xola A match made in a little home in Domlur, Bangalore

Oh god, why? 😩

Elasticsearch is an important component of Xola yet very few people know how to work with it.

In this presentation you'll learn:

  • What is Elasticsearch and how it differs from standard DBs
  • How it integrates with Xola
  • How queries & indexes work
  • How to triage & debug problems

How it works

  • Elasticsearch is a search engine (not a database) based on the Apache Lucene library which is the core search library that powers it. Lucene is a Java library that you can include in your project and invoke it using function calls.
  • What Elasticsearch gives you:
    • An HTTP web interface using JSON based language to the data that's managed based on the Lucene features. Elasticsearch's query features are very powerful and probably it's best asset.
    • A distributed web server built over Lucense. Primary, secondary servers (data replication), redundancy etc.
    • Other features like thread-pool, queues, data monitoring API and whole lot of other mumbo-jumbo

How it works

What it doesn't have:

  • Security based on username/password. If you want security, you need to write your own HTTP based security or install a plugin
  • A great UI client. It's done over HTTP using JSON, so you have to build apps like Kibana & ES head. Browsing ES data is very cumbersome because it is a search engine.
  • A non-HTTP based interface. Everything from querying to administration is done over HTTP by sending JSON

How Xola integrates ES

API

  • Uses two libraries:
    • FOSElasticabundle This is a Symfony bundle for integrating a PHP Elasticsearch library
    • XolaElasticsearchProxyBundle An authentication and authorization layer for Xola to ensure no seller can query another seller's ES data.
  • For purchase search, direct API calls to elasticsearch:9200 are made to run search queries

UI

  1. Javascript composes a query
  2. The query is sent to /api/elasticsearch
  3. This query is then processed by XolaElasticsearchProxyBundle to send it to Elasticsearch
  4. Results of the query are returned back to Javascript directly without any modification

FOSElasticaBundle

FOS = Friends of Symfony. It's the heart and soul of Xola's ES integration. This bundle will do the following:

  • Wrap around Elastica (github.com/ruflin/Elastica) which is PHP's premier ES library
  • Automatically sync data to Elasticsearch onKernelTerminate when a order/transaction is created or updated
  • Allow us to define the schema for our data in YAML and copy it over to Elasticsearch
  • Allows us to easily search for data using Doctrine like syntax instead of writing JSON
  • Gives us commands like fos:elastica:populate which allows us to copy data from MongoDB to Elasticsearch

Commands to know

  • bin/console fos:elastica:reset --index=order A reset - Wipe out all data from the order index and copy over mappings from elasticsearch.yml to ES.
  • bin/console xola:elastica:populate --index=order Use smart filters to populate data from MongoDB to Elasticsearch. This command is written is written by Xola and overrides FOS bundle's fos:elastica:populate - don't touch that command.
  • bin/console xola:elastica:syncUpdated --index=transaction --hours=2 Sync transactions that have been updated in last 2 hours

XolaElasticsearchProxy

A security focused bundle for Xola. It will:

  • Make API calls to Elasticsearch from PHP instead of Javascript
  • Before and after the API call it will fire off one event each. The consumer (Xola API) can listen to these events and modify the query being sent to ES.
  • Xola will add a seller filter to the query to ensure only one seller's records are returned

The Fundamentals

  • Elasticsearch is composed of indexes, just like MongoDB has a collection.
  • Each index has a schema associated with it that defines the type of a field.
  • Each field has a defined datatype and a single piece of data. Datatypes include integer, float, string, keyword, date and other specialized datatypes like geo, token count, join etc.
  • There are also multi-fields - fields that can be indexed in more than one way. For example, something might be indexed either as free text or specific keywords.
  • Documents are composed of multiple key value pairs used by fields, just like a document in MongoDB
  • Shards are a single Lucene index. They are the building blocks of Elasticsearch and what facilitate its scalability. Shards are a way to horizontally scale a single index when it starts to get too large.

Step 1: Setting up an index

A simple POST or PUT call


      curl -X PUT -H "Content-type: application/json" http://localhost:9200/order -d '
      {
        "mappings": {
          "order": {
            "properties": {
              "customerName": {
                "type": "keyword"
              },
              "amount": {
                "type": "float"
              }
            }
          }
        }
      }'

Step 2: Send some data

A simple POST or PUT call


              curl -X PUT -H "Content-type: application/json" http://localhost:9200/order -d '{
                "customerName": "Rushi Vishavadia",
                "amount": 100.23
              }'
						
          
  • FOSElasticaBundle will automatically convert the Order model into JSON and make the HTTP call to ES
  • The ES fields sent are defined based on what's defined in elasticsearch.yml
  • Attribute values will be fetched using getters defined as getFieldName()

Step 3. Fetch a document


          curl -X GET localhost:9200/order/order/some_mongo_id
        
  • The "order" string is in there twice because v6.8 of Elasticsearch still supports a deprecated concept called "type". It will be removed in v7.
  • The response will be a json version of your order along with some ES specific metadata.

Doing an Initial Setup

Once you install Elasticsearch here's what you need todo


        # Create all four (order, transaction, gift & store_credit) indexes & copy the mappings
        bin/console fos:elastica:reset
        # Send data from MongoDB
        bin/console xola:elastica:populate --seller 4f104661536e86b23d000000
        # OR
        bin/console xola:elastica:populate --createdAt 2020-10-10
        # OR
        bin/console xola:elastica:populate --createdAt 2020-10-10 --seller 4f104661536e86b23d000000
        

Tools

  1. Postman: Make pure API calls. Good UI for running multiple large queries with JSON formatting and tabs
  2. Elasticsearch Head: Chrome plugin for viewing indexes & visual browsing. You can also make API calls from there. Available on the Chrome store.
  3. Curl: Only if you like punishment and like writing raw JSON in the CLI 🙄

Step 1: Configuration

Configure once, then leave it alone


          parameters:
              elasticsearch_protocol: "http"
              elasticsearch_host: "localhost"
              elasticsearch_port: 9200
              elasticsearch_username: ""
              elasticsearch_password: ""
              elasticsearch_index_suffix: "" # Have indexes like 'transaction_rushi' instead of just 'transaction'
              elasticsearch_indexes: [ 'transaction%elasticsearch_index_suffix%', 'order%elasticsearch_index_suffix%', 'gift%elasticsearch_index_suffix%', 'store_credit%elasticsearch_index_suffix%' ]
              elasticsearch_api_path: "/api/elasticsearch"

              # Silently log Elastica exceptions
              fos_elastica.client.class: Xola\CommonBundle\Client\ElasticaClient


          # This is the XolaElasticsearchProxyBundle. It serves as a proxy to our ES server
          xola_elasticsearch_proxy:
              roles_skip_auth_filter: [ 'ROLE_ADMIN' ]
              client:
                  protocol: "%elasticsearch_protocol%"
                  host: "%elasticsearch_username%:%elasticsearch_password%@%elasticsearch_host%"
                  port: "%elasticsearch_port%"
                  indexes: "%elasticsearch_indexes%"

          # This configuration is used by the FOSElasticaBundle to sync data betwen MongoDB & Elasticsearch
          fos_elastica:
              clients:
                  default:
                      connections:
                          -   url: "%elasticsearch_protocol%://%elasticsearch_username%:%elasticsearch_password%@%elasticsearch_host%:%elasticsearch_port%"

              default_manager: mongodb

              indexes:
                  transaction:
                      index_name: 'transaction%elasticsearch_index_suffix%'
                      finder: ~
                      types:
                          transaction:
                              dynamic: strict
                              persistence:
                                  driver: mongodb
                                  model: Xola\PaymentBundle\Document\Transaction
                                  provider:
                                      service: Xola\CommonBundle\Service\XolaElasticaPopulatePagerProvider
                                  listener:
                                      defer: true
                                      logger: true
                                  finder: ~
                              properties:
                                  id: { type: keyword }
                                  amount: { type: float }
                                  realizedAt: { type: date }
                                  earning: { type: float }
        

Step 2: Query structure

  1. The core of every query will have two pieces:
    • Aggregation - The part that averages, sums things. Example: sum amount, date histogram
    • Filters - This applies filters to get the smallest set possible to run aggregations on. Filter by seller
  2. The hits section of the response shows you all the documents that were matched by the filter section above
  3. The size attribute determines how many results are shown in the `hits` section

Live code time

Let's walk through a simple real life query

Welcome to a world of pain 😰

Just kidding 😏

Triaging ES Problems

  1. Get the query from console XHR or API, and format it
  2. Put it in Postman and you will see the filter and aggregation parts
  3. Change the size atttribute to see if your filters work properly
  4. If your filters don't work remove filter components till something matches
  5. Debugging aggregations depends on the type of aggregation and is harder to explain here

Final - Quick Troubleshooting

  1. Check if your ES is up by curl -i http://localhost:9200
  2. Reset & re-populate your index from scratch
  3. Format & put your queries in Postman and run it from there. Use the size parameter too
  4. Use the ES Head plugin to manually build queries and verify presence of indices and data

The End

Questions?