=========
Collation
=========
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 2
:class: singlecol
.. versionadded:: 1.1
Overview
--------
MongoDB 3.4 introduced support for :manual:`collations
</manual/reference/collation/>`, which provide a set of rules to comply with the
conventions of a particular language when comparing strings.
For example, in Canadian French, the last accent in a given word determines the
sorting order. Consider the following French words:
.. code-block:: none
cote < coté < côte < côté
The sort order using the Canadian French collation would result in the
following:
.. code-block:: none
cote < côte < coté < côté
If collation is unspecified, MongoDB uses simple binary comparison for strings.
As such, the sort order of the words would be:
.. code-block:: none
cote < coté < côte < côté
Usage
-----
You can specify a default collation for collections and indexes when they are
created, or specify a collation for CRUD operations and aggregations. For
operations that support collation, MongoDB uses the collection's default
collation unless the operation specifies a different collation.
Collation Parameters
~~~~~~~~~~~~~~~~~~~~
.. code-block:: php
'collation' => [
'locale' => <string>,
'caseLevel' => <boolean>,
'caseFirst' => <string>,
'strength' => <integer>,
'numericOrdering' => <boolean>,
'alternate' => <string>,
'maxVariable' => <string>,
'normalization' => <boolean>,
'backwards' => <boolean>,
]
The only required parameter is ``locale``, which the server parses as an `ICU
format locale ID <http://userguide.icu-project.org/locale>`_. For example, set
``locale`` to ``en_US`` to represent US English or ``fr_CA`` to represent
Canadian French.
For a complete description of the available parameters, see :manual:`Collation
Document </reference/collation/#collation-document>` in the MongoDB manual.
Assign a Default Collation to a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following example creates a new collection called ``contacts`` on the
``test`` database and assigns a default collation with the ``fr_CA`` locale.
Specifying a collation when you create the collection ensures that all
operations involving a query that are run against the ``contacts`` collection
use the ``fr_CA`` collation, unless the query specifies another collation. Any
indexes on the new collection also inherit the default collation, unless the
creation command specifies another collation.
.. code-block:: php
<?php
$database = (new MongoDB\Client)->test;
$database->createCollection('contacts', [
'collation' => ['locale' => 'fr_CA'],
]);
Assign a Collation to an Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To specify a collation for an index, use the ``collation`` option when you
create the index.
The following example creates an index on the ``name`` field of the
``address_book`` collection, with the ``unique`` parameter enabled and a default
collation with ``locale`` set to ``en_US``.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->address_book;
$collection->createIndex(
['first_name' => 1],
[
'collation' => ['locale' => 'en_US'],
'unique' => true,
]
);
To use this index, make sure your queries also specify the same collation. The
following query uses the above index:
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->address_book;
$cursor = $collection->find(
['first_name' => 'Adam'],
[
'collation' => ['locale' => 'en_US'],
]
);
The following queries do **NOT** use the index. The first query uses no
collation, and the second uses a collation with a different ``strength`` value
than the collation on the index.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->address_book;
$cursor1 = $collection->find(['first_name' => 'Adam']);
$cursor2 = $collection->find(
['first_name' => 'Adam'],
[
'collation' => [
'locale' => 'en_US',
'strength' => 2,
],
]
);
Operations that Support Collation
---------------------------------
All reading, updating, and deleting methods support collation. Some examples are
listed below.
``find()`` with ``sort``
~~~~~~~~~~~~~~~~~~~~~~~~
Individual queries can specify a collation to use when matching and sorting
results. The following query and sort operation uses a German collation with the
``locale`` parameter set to ``de``.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->contacts;
$cursor = $collection->find(
['city' => 'New York'],
[
'collation' => ['locale' => 'de'],
'sort' => ['name' => 1],
]
);
``findOneAndUpdate()``
~~~~~~~~~~~~~~~~~~~~~~
A collection called ``names`` contains the following documents:
.. code-block:: javascript
{ "_id" : 1, "first_name" : "Hans" }
{ "_id" : 2, "first_name" : "Gunter" }
{ "_id" : 3, "first_name" : "Günter" }
{ "_id" : 4, "first_name" : "Jürgen" }
The following :phpmethod:`findOneAndUpdate()
<MongoDB\\Collection::findOneAndUpdate>` operation on the collection does not
specify a collation.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->names;
$document = $collection->findOneAndUpdate(
['first_name' => ['$lt' =-> 'Gunter']],
['$set' => ['verified' => true]]
);
Because ``Gunter`` is lexically first in the collection, the above operation
returns no results and updates no documents.
Consider the same :phpmethod:`findOneAndUpdate()
<MongoDB\\Collection::findOneAndUpdate>` operation but with a collation
specified, which uses the locale ``de@collation=phonebook``.
.. note::
Some locales have a ``collation=phonebook`` option available for use with
languages which sort proper nouns differently from other words. According to
the ``de@collation=phonebook`` collation, characters with umlauts come before
the same characters without umlauts.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->names;
$document = $collection->findOneAndUpdate(
['first_name' => ['$lt' =-> 'Gunter']],
['$set' => ['verified' => true]],
[
'collation' => ['locale' => 'de@collation=phonebook'],
'returnDocument' => MongoDB\Operation\FindOneAndUpdate::RETURN_DOCUMENT_AFTER,
]
);
The operation returns the following updated document:
.. code-block:: javascript
{ "_id" => 3, "first_name" => "Günter", "verified" => true }
``findOneAndDelete()``
~~~~~~~~~~~~~~~~~~~~~~
Set the ``numericOrdering`` collation parameter to ``true`` to compare numeric
strings by their numeric values.
The collection ``numbers`` contains the following documents:
.. code-block:: javascript
{ "_id" : 1, "a" : "16" }
{ "_id" : 2, "a" : "84" }
{ "_id" : 3, "a" : "179" }
The following example matches the first document in which field ``a`` has a
numeric value greater than 100 and deletes it.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->numbers;
$document = $collection->findOneAndDelete(
['a' => ['$gt' =-> '100']],
[
'collation' => [
'locale' => 'en',
'numericOrdering' => true,
],
]
);
After the above operation, the following documents remain in the collection:
.. code-block:: javascript
{ "_id" : 1, "a" : "16" }
{ "_id" : 2, "a" : "84" }
If you perform the same operation without collation, the server deletes the
first document it finds in which the lexical value of ``a`` is greater than
``"100"``.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->numbers;
$document = $collection->findOneAndDelete(['a' => ['$gt' =-> '100']]);
After the above operation is executed, the document in which ``a`` was equal to
``"16"`` has been deleted, and the following documents remain in the collection:
.. code-block:: javascript
{ "_id" : 2, "a" : "84" }
{ "_id" : 3, "a" : "179" }
``deleteMany()``
~~~~~~~~~~~~~~~~
You can use collations with all the various CRUD operations which exist in the
|php-library|.
The collection ``recipes`` contains the following documents:
.. code-block:: javascript
{ "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" }
{ "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" }
{ "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" }
{ "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" }
{ "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }
Setting the ``strength`` parameter of the collation document to ``1`` or ``2``
causes the server to disregard case in the query filter. The following example
uses a case-insensitive query filter to delete all records in which the
``cuisine`` field matches ``French``.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->recipes;
$collection->deleteMany(
['cuisine' => 'French'],
[
'collation' => [
'locale' => 'en_US',
'strength' => 1,
],
]
);
After the above operation runs, the documents with ``_id`` values of ``2`` and
``4`` are deleted from the collection.
Aggregation
~~~~~~~~~~~
To use collation with an :phpmethod:`aggregate()
<MongoDB\\Collection::aggregate>` operation, specify a collation in the
aggregation options.
The following aggregation example uses a collection called ``names`` and groups
the ``first_name`` field together, counts the total number of results in each
group, and sorts the results by German phonebook order.
.. code-block:: php
<?php
$collection = (new MongoDB\Client)->test->recipes;
$cursor = $collection->aggregate(
[
['$group' => ['_id' => '$first_name', 'name_count' => ['$sum' => 1]]],
['$sort' => ['_id' => 1]],
],
[
'collation' => ['locale' => 'de@collation=phonebook'],
]
);
|