Indexes and Fields¶
BM25 indexes in ParadeDB provide full-text search with relevance scoring. Creating a BM25 index is similar to creating a default Django index — you specify the fields to index, and you can use expressions or individual fields along with conditional expressions. Additional BM25-specific parameters, such as boosting, stopwords, and field normalization, can also be configured.
- import from
paradedb.indexesmodule
IndexField¶
IndexField represents a model field inside a ParadeDB BM25 index with an optional PostgreSQL cast used for ParadeDB tokenizers.
IndexField(field: str | models.F, tokenizer_cast: typing.Optional[str] = None, field_resolver: typing.Optional[typing.Callable] = None)
tokenizer_cast¶
tokenizer_cast is appended directly to the SQL column reference and is expected to be a postgres paradedb cast, for example:
pdb.literalpdb.ngram(1,2)and other
field_resolver¶
field_resolver resolved the field name with our without expression.
Example
from django.db import models
from paradedb.indexes import Bm25Index, IndexField
class Article(models.Model):
title = models.CharField(max_length=255)
body = models.TextField()
metadata = models.JsonField(default=dict)
class Meta:
indexes = [
Bm25Index(
IndexField("body", tokenizer_cast="pdb.ngram(1,2)"),
IndexField("(metadata->>'word_count')", tokenizer_cast="pdb.ngram(1,2)"),
IndexField(Lower('title'), tokenizer_cast="pdb.ngram(1,2)"),
name="article_bm25",
)
]
Bm25Index¶
Bm25Index(
*expressions,
fields=(),
name: str,
db_tablespace=None,
opclasses=(),
condition=None,
include=None,
fields_config: typing.Optional[IndexFieldConfig] = None,
key_field: str = "id",
with_extra: with_extra: typing.Optional[typing.Dict[str, typing.Any]] = None
)
fields_configadditional field configuration for bm25 index. see the index field configuration
To learn more about Index parameters, see the Django Index
IndexFieldConfig¶
IndexFieldConfig(
text_fields: typing.Optional[typing.List[TextFieldIndexConfig]] = None,
json_fields: typing.Optional[typing.List[JSONFieldIndexConfig]] = None,
numeric_fields: typing.Optional[typing.List[NumericFieldIndexConfig]] = None,
boolean_fields: typing.Optional[typing.List[BooleanFieldIndexConfig]] = None,
datetime_fields: typing.Optional[typing.List[DateTimeFieldIndexConfig]] = None,
)
see
TextFieldIndexConfigfield configuration
seeJSONFieldIndexConfigfield configuration
seeNumericFieldIndexConfigfield configuration
seeBooleanFieldIndexConfigfield configuration
seeDateTimeFieldIndexConfigfield configuration
To learn more, see the ParadeDB field configuration.
TextFieldIndexConfig¶
TextFieldIndexConfig(
field: str,
fast: bool = True,
tokenizer: typing.Optional[Tokenizer] = None,
normalizer: typing.Literal["raw", "lowercase"] = "raw",
record: typing.Literal["position", "freq", "basic"] = "position",
indexed: bool = True,
fieldnorms: bool = True,
column: typing.Optional[str] = None,
)
Configure a text field for indexing.
If column is provided, field will be treated as an alias.
To learn more, see the ParadeDB TextFieldIndexConfig.
Example
from paradedb.indexes import Bm25Index, IndexFieldConfig, TextFieldIndexConfig
Bm25Index(
fields=["id", "title", "description"],
name="idx_name",
fields_config= IndexFieldConfig(
text_fields=[
TextFieldIndexConfig(
field="title",
fast=True,
normalizer="lowercase",
record="position"
),
TextFieldIndexConfig(
field="description",
fast=True,
normalizer="lowercase",
)
]
))
JSONFieldIndexConfig¶
JSONFieldIndexConfig(
field: str,
fast: bool = True,
tokenizer: typing.Optional[Tokenizer] = None,
normalizer: typing.Optional[typing.Literal["raw","lowercase"]] = None,
record: typing.Literal["position", "freq", "basic"] = "position",
indexed: bool = True,
fieldnorms: bool = True,
column: typing.Optional[str] = None,
expand_dots: bool = True,
)
Configure a json field for indexing. To learn more, see the ParadeDB JSONFieldIndexConfig.
Example
from paradedb.indexes import Bm25Index, IndexFieldConfig, JSONFieldIndexConfig
Bm25Index(
fields=["id", "metadata"],
name="idx_name",
fields_config=IndexFieldConfig(
json_fields=[
JSONFieldIndexConfig(
field="metadata",
fast=True,
expand_dots=True
)
]
)
)
NumericFieldIndexConfig¶
NumericFieldIndexConfig(
field: str,
fast: bool = True,
indexed: bool = True,
column: typing.Optional[str] = None,
)
Configure a numeric field for indexing. To learn more, see the ParadeDB NumericFieldIndexConfig.
Example
from paradedb.indexes import Bm25Index, IndexFieldConfig, NumericFieldIndexConfig
Bm25Index(
fields=["id", "rank"],
name="idx_name",
fields_config=IndexFieldConfig(
numeric_fields=[
NumericFieldIndexConfig(field="rank", fast=True),
]
)
)
BooleanFieldIndexConfig¶
BooleanFieldIndexConfig(
field: str,
fast: bool = True,
indexed: bool = True,
column: typing.Optional[str] = None,
)
Example
from paradedb.indexes import Bm25Index, IndexFieldConfig, BooleanFieldIndexConfig
Bm25Index(
fields=["id", "published"],
name="idx_name",
fields_config=IndexFieldConfig(
numeric_fields=[
BooleanFieldIndexConfig(field="published", fast=True),
]
))
DateTimeFieldIndexConfig¶
DateTimeFieldIndexConfig(
field: str,
fast: bool = True,
indexed: bool = True,
column: typing.Optional[str] = None,
)
Example
from paradedb.indexes import Bm25Index, IndexFieldConfig, DateTimeFieldIndexConfig
Bm25Index(
fields=["id", "created"],
name="idx_name",
fields_config=IndexFieldConfig(
numeric_fields=[
DateTimeFieldIndexConfig(field="created", fast=True),
]
))
Using Tokenizers with BM25 Index¶
from paradedb.indexes import Bm25Index, IndexFieldConfig, TextFieldIndexConfig, JSONFieldIndexConfig
from paradedb.tokenizers import WhitespaceTokenizer, NGramTokenizer
Bm25Index(
fields=["id", "title", "description", "metadata"],
name="article_bm25_tokenizer_idx",
fields_config=IndexFieldConfig(
text_fields=[
TextFieldIndexConfig(
field="title",
fast=True,
tokenizer=WhitespaceTokenizer(),
normalizer="lowercase",
record="position"
),
TextFieldIndexConfig(
field="description",
fast=True,
tokenizer=NGramTokenizer(min_gram=2, max_gram=3),
normalizer="lowercase",
record="position"
)
],
json_fields=[
JSONFieldIndexConfig(
field="metadata",
tokenizer=WhitespaceTokenizer(),
expand_dots=True
)
]
)
)
To see more details about each tokenizer, check the class documentation above. To see all tokenizers and options, see Tokenizers.