Skip to content

Commit bf32dec

Browse files
authored
updates (#7)
updates
1 parent 182756a commit bf32dec

File tree

7 files changed

+125
-56
lines changed

7 files changed

+125
-56
lines changed

.ci/generate_fake_data.py

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,34 @@
44

55
fake = Faker()
66

7-
8-
print("generate 5000 fake profiles, write them all into a bson object,"
9-
"read back only those with @gmail.com emails")
10-
11-
12-
with open("fake_profiles.bson", "wb") as f:
13-
for _ in range(500):
14-
faked = fake.simple_profile()
15-
del faked['birthdate'] # bson doesn't like date wants datetime
16-
bson_data = bson.BSON.encode(faked)
17-
f.write(bson_data)
18-
19-
found_gmails = 0
20-
with open("fake_profiles.bson", "rb") as f:
21-
stream = BSONInput(fh=f, fast_string_prematch=b"@gmail.com")
22-
for doc in stream:
23-
assert "@gmail" in doc['mail'] # bson handles the utf8 decoding by default!
24-
found_gmails += 1
25-
26-
27-
assert found_gmails > 0
28-
print(f"found {found_gmails} from gmails")
7+
if __name__ == "__main__":
8+
print("generate 5000 fake profiles, write them all into a bson object,"
9+
"read back only those with @gmail.com emails")
10+
11+
12+
with open("fake_profiles.bson", "wb") as f:
13+
for _ in range(500):
14+
faked = fake.simple_profile()
15+
del faked['birthdate'] # bson doesn't like date wants datetime
16+
bson_data = bson.encode(faked)
17+
f.write(bson_data)
18+
19+
found_gmails = 0
20+
with open("fake_profiles.bson", "rb") as f:
21+
stream = BSONInput(fh=f, fast_string_prematch=b"@gmail.com")
22+
for doc in stream:
23+
assert "@gmail" in doc['mail'] # bson handles the utf8 decoding by default!
24+
found_gmails += 1
25+
26+
27+
assert found_gmails > 0
28+
print(f"found {found_gmails} from gmails")
29+
30+
found_gmails_raw = 0
31+
with open("fake_profiles.bson", "rb") as f:
32+
stream = BSONInput(fh=f, fast_string_prematch=b"@gmail.com", decode=False)
33+
for raw_bson in stream:
34+
assert b"@gmail" in raw_bson # not even bothering to decode to a dict
35+
found_gmails_raw += 1
36+
assert found_gmails_raw > 0
37+
print(f"found {found_gmails_raw} from gmails without even decoding the BSON")
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,31 @@
1-
name: Python package
1+
name: test the module
22

33
on:
44
push:
5-
branches: [ master ]
5+
branches: [ master, dev ]
66
pull_request:
77
branches: [ master ]
88

99
jobs:
1010
simpletest:
1111
runs-on:
12-
- ubuntu-18.04
12+
- ubuntu-latest
1313
strategy:
1414
matrix:
15-
python-version: [3.7, 3.8]
15+
python-version: [3.11, 3.12, 3.13, 3.14]
1616

1717
steps:
18-
- uses: actions/checkout@v2
18+
- uses: actions/checkout@v5
19+
1920
- name: Set up Python ${{ matrix.python-version }}
20-
uses: actions/setup-python@v1
21+
uses: actions/setup-python@v6
2122
with:
2223
python-version: ${{ matrix.python-version }}
24+
25+
2326
- name: run a simple test
2427
run: |
25-
pip install pymongo Faker
28+
pip install pymongo faker
2629
cp .ci/generate_fake_data.py .
27-
python generate_fake_data.py
30+
python3 generate_fake_data.py
31+

.github/workflows/python-wheel.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: build the wheel
2+
3+
on:
4+
push:
5+
branches: [ master, dev ]
6+
pull_request:
7+
branches: [ master ]
8+
9+
jobs:
10+
wheelbuild:
11+
runs-on:
12+
- ubuntu-latest
13+
14+
steps:
15+
- uses: actions/checkout@v5
16+
17+
18+
- name: build and run a simple test
19+
run: |
20+
pip install wheel build pymongo faker
21+
python3 -m build
22+
23+
- uses: actions/upload-artifact@v4
24+
with:
25+
name: wheel-file
26+
path: ./dist/*.whl

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
#test data
2+
fake_profiles.bson
3+
4+
15
#################
26
## Eclipse
37
#################

README.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The fast_string_prematch would not bother converting records that do not have "g
4242
somewhere in the document as plaintext.
4343

4444
``` python
45-
from bsonstream import KeyValueBSONInput
45+
from bsonstream import BSONInput
4646
from sys import argv
4747
import gzip
4848
for file in argv[1:]:
@@ -51,12 +51,30 @@ somewhere in the document as plaintext.
5151
f = open(file, 'rb')
5252
else:
5353
f=gzip.open(file,'rb')
54-
stream = KeyValueBSONInput(fh=f, fast_string_prematch=b"github")
54+
stream = BSONInput(fh=f, fast_string_prematch=b"github")
5555
for dict_data in stream:
5656
...process dict_data...
5757
```
5858

5959

60+
or if you are passing data to another tool that can handle raw bson (like bsonsearch), don't even bother decoding the BSON to a dict
61+
62+
``` python
63+
from bsonstream import BSONInput
64+
from sys import argv
65+
import gzip
66+
for file in argv[1:]:
67+
f=None
68+
if "gz" not in file:
69+
f = open(file, 'rb')
70+
else:
71+
f=gzip.open(file,'rb')
72+
stream = BSONInput(fh=f, fast_string_prematch=b"github")
73+
for raw_bson in stream:
74+
...process dict_data...
75+
```
76+
77+
6078
## Benchmark
6179
Unfortunately, I cannot make available the test bson file.
6280

@@ -85,7 +103,7 @@ With fast string matcher. In this case, documents matching the fast string pate
85103
## Dependencies
86104

87105
Required libraries
88-
* [python-bson]
106+
* [pymongo]
89107

90108

91109
## Versioning

pyproject.toml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
[project]
2+
name = "bsonstream"
3+
version = "0.1.7"
4+
description = "BSON stream raw data into dict or individual BSON format - python"
5+
6+
requires-python = ">=3.11"
7+
dependencies = [
8+
"pymongo~=4.15.2"
9+
]
10+
11+
[project.urls]
12+
GitHub = "https://github.com/bauman/python-bson-streaming"
13+
14+
[build-system]
15+
requires = ["setuptools>=61"]
16+
build-backend = "setuptools.build_meta"
17+
18+
19+
[tool.cibuildwheel.linux]
20+
# This command runs for the manylinux containers (based on CentOS).
21+
archs = ["x86_64"]
22+
23+
24+
[tool.cibuildwheel.macos]
25+
archs = ["x86_64", "universal2", "arm64"]
26+
27+
28+
[[tool.cibuildwheel.overrides]]
29+
select = "*-musllinux*"
30+
# This command runs for the musllinux containers (based on Alpine Linux).
31+
archs = ["x86_64"]

setup.py

Lines changed: 0 additions & 23 deletions
This file was deleted.

0 commit comments

Comments
 (0)