search-index rebuild fails: Row object leaking into pkg_dict (JSON serialization)

## Summary

`ckan search-index rebuild` on staging fails for several datasets with:

```
TypeError: Object of type Row is not JSON serializable
```

Without `-i`, the first such dataset aborts the whole rebuild, leaving a partial index. With `-i` the errors are skipped but those datasets are missing from Solr and will not appear in search.

## Traceback

```
File "/usr/lib/adx/submodules/ckan/ckan/lib/search/__init__.py", line 251, in rebuild
    package_index.update_dict(
File "/usr/lib/adx/submodules/ckan/ckan/lib/search/index.py", line 105, in update_dict
    self.index_package(pkg_dict, defer_commit)
File "/usr/lib/adx/submodules/ckan/ckan/lib/search/index.py", line 124, in index_package
    data_dict_json = json.dumps(pkg_dict)
  ...
TypeError: Object of type Row is not JSON serializable
```

A SQLAlchemy `Row` object is leaking into `pkg_dict` before `json.dumps` at `ckan/lib/search/index.py:124`. This is almost certainly an extension (`before_dataset_index` / `before_index` hook, or a package-dict modifier) attaching a raw query result instead of converting it to a plain dict/list.

## Known affected dataset IDs (staging)

- `e6426824-6985-4d33-91e0-a785ce9634b9`
- `a705c2a4-f498-450c-aa41-c28061e11dfa`
- `d55c9e6f-c051-4c2d-95dd-41e21b70a020`

## Separate but related

During the same rebuild, one dataset also fails Solr-side with an immense-term error on its `config` field:

```
Exception writing document id ba3bbd4e59033c155d1755a6d2f52075 ...
Document contains at least one immense term in field="config"
(whose UTF8 encoding is longer than the max length 32766);
bytes can be at most 32766 in length; got 38485.
```

Separate root cause (Solr schema: `config` is declared as `StrField`, which is not analyzed/tokenized and has a 32 KB term limit). Either truncate/normalize the value upstream, or change the schema field type to `TextField`.

## Repro

```bash
kubectl exec deployment/ckan -n adr-s -- \
  ckan -c /tmp/production.ini search-index rebuild
```

## Suggested investigation path

1. Grep the codebase and submodules for `before_dataset_index` / `before_index` / functions that mutate `pkg_dict` and look for ones that assign the result of a SQLAlchemy query directly (i.e., without `._asdict()` or `dict(row)` or iterating into a list of dicts).
2. Load one of the affected datasets via `ckan dataset show <id>` and compare to an unaffected dataset to narrow down which field holds the `Row`.
3. For the `config` field issue, inspect dataset `ba3bbd4e59033c155d1755a6d2f52075` and decide whether the giant value is legitimate.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search-index rebuild fails: Row object leaking into pkg_dict (JSON serialization) #201

Summary

Traceback

Known affected dataset IDs (staging)

Separate but related

Repro

Suggested investigation path

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

search-index rebuild fails: Row object leaking into pkg_dict (JSON serialization) #201

Description

Summary

Traceback

Known affected dataset IDs (staging)

Separate but related

Repro

Suggested investigation path

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions