Skip to content

Indexes and Fields

BM25 indexes in ParadeDB provide full-text search with relevance scoring. Creating a BM25 index is similar to creating a default Django index — you specify the fields to index, and you can use expressions or individual fields along with conditional expressions. Additional BM25-specific parameters, such as boosting, stopwords, and field normalization, can also be configured.


  • import from paradedb.indexes module

IndexField

IndexField represents a model field inside a ParadeDB BM25 index with an optional PostgreSQL cast used for ParadeDB tokenizers.

IndexField(field: str | models.F, tokenizer_cast: typing.Optional[str] = None, field_resolver: typing.Optional[typing.Callable] = None)

tokenizer_cast

tokenizer_cast is appended directly to the SQL column reference and is expected to be a postgres paradedb cast, for example:

  • pdb.literal
  • pdb.ngram(1,2) and other

field_resolver

field_resolver resolved the field name with our without expression.


Example

from django.db import models
from paradedb.indexes import Bm25Index, IndexField

class Article(models.Model):
    title = models.CharField(max_length=255)
    body = models.TextField()
    metadata = models.JsonField(default=dict)

    class Meta:
        indexes = [
            Bm25Index(
                IndexField("body", tokenizer_cast="pdb.ngram(1,2)"),
                IndexField("(metadata->>'word_count')", tokenizer_cast="pdb.ngram(1,2)"),
                IndexField(Lower('title'), tokenizer_cast="pdb.ngram(1,2)"),
                name="article_bm25",
            )
        ]

Bm25Index

Bm25Index(
    *expressions,
    fields=(),
    name: str,
    db_tablespace=None,
    opclasses=(),
    condition=None,
    include=None,
    fields_config: typing.Optional[IndexFieldConfig] = None,
    key_field: str = "id",
    with_extra: with_extra: typing.Optional[typing.Dict[str, typing.Any]] = None
)

fields_config additional field configuration for bm25 index. see the index field configuration
To learn more about Index parameters, see the Django Index


IndexFieldConfig

IndexFieldConfig(
    text_fields: typing.Optional[typing.List[TextFieldIndexConfig]] = None,
    json_fields: typing.Optional[typing.List[JSONFieldIndexConfig]] = None,
    numeric_fields: typing.Optional[typing.List[NumericFieldIndexConfig]] = None,
    boolean_fields: typing.Optional[typing.List[BooleanFieldIndexConfig]] = None,
    datetime_fields: typing.Optional[typing.List[DateTimeFieldIndexConfig]] = None,
)

see TextFieldIndexConfig field configuration
see JSONFieldIndexConfig field configuration
see NumericFieldIndexConfig field configuration
see BooleanFieldIndexConfig field configuration
see DateTimeFieldIndexConfig field configuration

To learn more, see the ParadeDB field configuration.


TextFieldIndexConfig

TextFieldIndexConfig(
    field: str,
    fast: bool = True,
    tokenizer: typing.Optional[Tokenizer] = None,
    normalizer: typing.Literal["raw", "lowercase"] = "raw",
    record: typing.Literal["position", "freq", "basic"] = "position",
    indexed: bool = True,
    fieldnorms: bool = True,
    column: typing.Optional[str] = None,
)

Configure a text field for indexing.
If column is provided, field will be treated as an alias.
To learn more, see the ParadeDB TextFieldIndexConfig.

Example

from paradedb.indexes import Bm25Index, IndexFieldConfig, TextFieldIndexConfig

Bm25Index(
    fields=["id", "title", "description"],
    name="idx_name",
    fields_config= IndexFieldConfig(
    text_fields=[
        TextFieldIndexConfig(
            field="title",
            fast=True,
            normalizer="lowercase",
            record="position"
        ),
        TextFieldIndexConfig(
            field="description",
            fast=True,
            normalizer="lowercase",
        )
    ]
))


JSONFieldIndexConfig

JSONFieldIndexConfig(
    field: str,
    fast: bool = True,
    tokenizer: typing.Optional[Tokenizer] = None,
    normalizer: typing.Optional[typing.Literal["raw","lowercase"]] = None,
    record: typing.Literal["position", "freq", "basic"] = "position",
    indexed: bool = True,
    fieldnorms: bool = True,
    column: typing.Optional[str] = None,
    expand_dots: bool = True,
)

Configure a json field for indexing. To learn more, see the ParadeDB JSONFieldIndexConfig.

Example

from paradedb.indexes import Bm25Index, IndexFieldConfig, JSONFieldIndexConfig

Bm25Index(
    fields=["id", "metadata"],
    name="idx_name",
    fields_config=IndexFieldConfig(
    json_fields=[
        JSONFieldIndexConfig(
            field="metadata",
            fast=True,
            expand_dots=True
        )
    ]
)
)


NumericFieldIndexConfig

NumericFieldIndexConfig(
    field: str,
    fast: bool = True,
    indexed: bool = True,
    column: typing.Optional[str] = None,
)

Configure a numeric field for indexing. To learn more, see the ParadeDB NumericFieldIndexConfig.

Example

from paradedb.indexes import Bm25Index, IndexFieldConfig, NumericFieldIndexConfig

Bm25Index(
    fields=["id", "rank"],
    name="idx_name",
    fields_config=IndexFieldConfig(
    numeric_fields=[
        NumericFieldIndexConfig(field="rank", fast=True),
    ]
)
)


BooleanFieldIndexConfig

BooleanFieldIndexConfig(
    field: str,
    fast: bool = True,
    indexed: bool = True,
    column: typing.Optional[str] = None,
)
Configure a boolean field for indexing. To learn more, see the ParadeDB BooleanFieldIndexConfig.

Example

from paradedb.indexes import Bm25Index, IndexFieldConfig, BooleanFieldIndexConfig

Bm25Index(
    fields=["id", "published"],
    name="idx_name",
    fields_config=IndexFieldConfig(
    numeric_fields=[
        BooleanFieldIndexConfig(field="published", fast=True),
    ]
))


DateTimeFieldIndexConfig

DateTimeFieldIndexConfig(
    field: str,
    fast: bool = True,
    indexed: bool = True,
    column: typing.Optional[str] = None,
)
Configure a date or datetime field for indexing. To learn more, see the ParadeDB DateTimeFieldIndexConfig.

Example

from paradedb.indexes import Bm25Index, IndexFieldConfig, DateTimeFieldIndexConfig

Bm25Index(
    fields=["id", "created"],
    name="idx_name",
    fields_config=IndexFieldConfig(
    numeric_fields=[
        DateTimeFieldIndexConfig(field="created", fast=True),
    ]
))


Using Tokenizers with BM25 Index

from paradedb.indexes import Bm25Index, IndexFieldConfig, TextFieldIndexConfig, JSONFieldIndexConfig
from paradedb.tokenizers import WhitespaceTokenizer, NGramTokenizer

Bm25Index(
    fields=["id", "title", "description", "metadata"],
    name="article_bm25_tokenizer_idx",
    fields_config=IndexFieldConfig(
        text_fields=[
            TextFieldIndexConfig(
                field="title",
                fast=True,
                tokenizer=WhitespaceTokenizer(),
                normalizer="lowercase",
                record="position"
            ),
            TextFieldIndexConfig(
                field="description",
                fast=True,
                tokenizer=NGramTokenizer(min_gram=2, max_gram=3),
                normalizer="lowercase",
                record="position"
            )
        ],
        json_fields=[
            JSONFieldIndexConfig(
                field="metadata",
                tokenizer=WhitespaceTokenizer(),
                expand_dots=True
            )
        ]
    )
)

To see more details about each tokenizer, check the class documentation above. To see all tokenizers and options, see Tokenizers.