Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ __pycache__/
dist/
.env
.vscode/
.DS_Store
.DS_Store
*.json
53 changes: 52 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This library provides Python interfaces for interacting with Substack's unoffici
- Get user profile information and subscriptions
- Fetch post content and metadata
- Search for posts within newsletters
- Access paywalled content **that you have written or paid for** with user-provided authentication

## Installation

Expand Down Expand Up @@ -65,6 +66,55 @@ metadata = post.get_metadata()
content = post.get_content()
```

### Accessing Paywalled Content with Authentication

To access paywalled content, you need to provide your own session cookies from a logged-in Substack session:

```python
from substack_api import Newsletter, Post, SubstackAuth

# Set up authentication with your cookies
auth = SubstackAuth(cookies_path="path/to/your/cookies.json")

# Use authentication with newsletters
newsletter = Newsletter("https://example.substack.com", auth=auth)
posts = newsletter.get_posts(limit=5) # Can now access paywalled posts

# Use authentication with individual posts
post = Post("https://example.substack.com/p/paywalled-post", auth=auth)
content = post.get_content() # Can now access paywalled content

# Check if a post is paywalled
if post.is_paywalled():
print("This post requires a subscription")
```

#### Getting Your Cookies

To access paywalled content, you need to export your browser cookies from a logged-in Substack session. The cookies should be in JSON format with the following structure:

```json
[
{
"name": "substack.sid",
"value": "your_session_id",
"domain": ".substack.com",
"path": "/",
"secure": true
},
{
"name": "substack.lli",
"value": "your_lli_value",
"domain": ".substack.com",
"path": "/",
"secure": true
},
...
]
```

**Important**: Only use your own cookies from your own authenticated session. **This feature is intended for users to access their own subscribed or authored content programmatically.**

### Working with Users

```python
Expand All @@ -88,8 +138,9 @@ subscriptions = user.get_subscriptions()

- This is an unofficial library and not endorsed by Substack
- APIs may change without notice, potentially breaking functionality
- Some features may only work for public content
- Rate limiting may be enforced by Substack
- **Authentication requires users to provide their own session cookies**
- **Users are responsible for complying with Substack's terms of service when using authentication features**

## Development

Expand Down
163 changes: 163 additions & 0 deletions docs/api-reference/auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# SubstackAuth

The `SubstackAuth` class handles authentication for accessing paywalled Substack content.

## Class Definition

```python
SubstackAuth(cookies_path: str)
```

### Parameters

- `cookies_path` (str): Path to the JSON file containing session cookies

## Properties

### `authenticated` (bool)
Whether the authentication was successful and cookies were loaded.

### `cookies_path` (str)
Path to the cookies file.

### `session` (requests.Session)
The authenticated requests session object.

## Methods

### `load_cookies() -> bool`

Load cookies from the specified file.

#### Returns

- `bool`: True if cookies were loaded successfully, False otherwise

### `get(url: str, **kwargs) -> requests.Response`

Make an authenticated GET request.

#### Parameters

- `url` (str): The URL to request
- `**kwargs`: Additional arguments passed to requests.get

#### Returns

- `requests.Response`: The response object

### `post(url: str, **kwargs) -> requests.Response`

Make an authenticated POST request.

#### Parameters

- `url` (str): The URL to request
- `**kwargs`: Additional arguments passed to requests.post

#### Returns

- `requests.Response`: The response object

## Example Usage

### Basic Authentication Setup

```python
from substack_api import SubstackAuth

# Initialize with cookies file
auth = SubstackAuth(cookies_path="my_cookies.json")

# Check if authentication succeeded
if auth.authenticated:
print("Successfully authenticated!")
else:
print("Authentication failed")
```

### Using with Newsletter and Post Classes

```python
from substack_api import Newsletter, Post, SubstackAuth

# Set up authentication
auth = SubstackAuth(cookies_path="cookies.json")

# Use with Newsletter
newsletter = Newsletter("https://example.substack.com", auth=auth)
posts = newsletter.get_posts(limit=5)

# Use with Post
post = Post("https://example.substack.com/p/paywalled-post", auth=auth)
content = post.get_content()
```

### Manual Authenticated Requests

```python
from substack_api import SubstackAuth

auth = SubstackAuth(cookies_path="cookies.json")

# Make authenticated GET request
response = auth.get("https://example.substack.com/api/v1/posts/123")
data = response.json()

# Make authenticated POST request
response = auth.post(
"https://example.substack.com/api/v1/some-endpoint",
json={"key": "value"}
)
```

## Cookie File Format

The cookies file should be in JSON format with the following structure:

```json
[
{
"name": "substack.sid",
"value": "your_session_id",
"domain": ".substack.com",
"path": "/",
"secure": true
},
{
"name": "substack.lli",
"value": "your_lli_value",
"domain": ".substack.com",
"path": "/",
"secure": true
},
...
]
```

## Error Handling

The `SubstackAuth` class handles several error conditions:

- **File not found**: If the cookies file doesn't exist, `authenticated` will be `False`
- **Invalid JSON**: If the cookies file contains invalid JSON, `load_cookies()` returns `False`
- **Missing cookies**: If required cookies are missing, authentication may fail silently

```python
from substack_api import SubstackAuth

try:
auth = SubstackAuth(cookies_path="cookies.json")
if not auth.authenticated:
print("Authentication failed - check your cookies file")
except Exception as e:
print(f"Error setting up authentication: {e}")
```

## Security Notes

- Keep your cookies file secure and private
- Don't commit cookies files to version control
- Only use your own session cookies
- Cookies may expire and need to be refreshed periodically
- Respect Substack's Terms of Service when using authentication
1 change: 1 addition & 0 deletions docs/api-reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This section provides detailed documentation for all modules and classes in the
- [Newsletter](newsletter.md): Access to Substack publications, posts, and podcasts
- [Post](post.md): Access to individual Substack post content and metadata
- [Category](category.md): Discovery of newsletters by category
- [SubstackAuth](auth.md): Authentication for accessing paywalled content

Each module documentation includes:

Expand Down
14 changes: 12 additions & 2 deletions docs/api-reference/newsletter.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ The `Newsletter` class provides access to Substack publications.
## Class Definition

```python
Newsletter(url: str)
Newsletter(url: str, auth: Optional[SubstackAuth] = None)
```

### Parameters

- `url` (str): The URL of the Substack newsletter
- `auth` (Optional[SubstackAuth]): Authentication handler for accessing paywalled content

## Methods

Expand Down Expand Up @@ -85,7 +86,7 @@ Get authors of the newsletter.
## Example Usage

```python
from substack_api import Newsletter
from substack_api import Newsletter, SubstackAuth

# Create a newsletter object
newsletter = Newsletter("https://example.substack.com")
Expand Down Expand Up @@ -117,4 +118,13 @@ for author in authors:
recommendations = newsletter.get_recommendations()
for rec in recommendations:
print(f"Recommended: {rec.url}")

# Use with authentication for paywalled content
auth = SubstackAuth(cookies_path="cookies.json")
authenticated_newsletter = Newsletter("https://example.substack.com", auth=auth)
paywalled_posts = authenticated_newsletter.get_posts(limit=5)
for post in paywalled_posts:
if post.is_paywalled():
content = post.get_content() # Now accessible with auth
print(f"Paywalled content: {content[:100]}...")
```
36 changes: 28 additions & 8 deletions docs/api-reference/post.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ The `Post` class provides access to individual Substack posts.
## Class Definition

```python
Post(url: str)
Post(url: str, auth: Optional[SubstackAuth] = None)
```

### Parameters

- `url` (str): The URL of the Substack post
- `auth` (Optional[SubstackAuth]): Authentication handler for accessing paywalled content

## Methods

Expand Down Expand Up @@ -48,12 +49,20 @@ Get the HTML content of the post.

#### Returns

- `Optional[str]`: HTML content of the post, or None if not available
- `Optional[str]`: HTML content of the post, or None if not available (e.g., for paywalled content without authentication)

### `is_paywalled() -> bool`

Check if the post is paywalled.

#### Returns

- `bool`: True if the post requires a subscription to access full content

## Example Usage

```python
from substack_api import Post
from substack_api import Post, SubstackAuth

# Create a post object
post = Post("https://example.substack.com/p/post-slug")
Expand All @@ -63,11 +72,22 @@ metadata = post.get_metadata()
print(f"Title: {metadata['title']}")
print(f"Published: {metadata['post_date']}")

# Get post content
content = post.get_content()
# Check if the post is paywalled
if post.is_paywalled():
print("This post is paywalled")

# Set up authentication to access paywalled content
auth = SubstackAuth(cookies_path="cookies.json")
authenticated_post = Post("https://example.substack.com/p/post-slug", auth=auth)
content = authenticated_post.get_content()
else:
# Public content - no authentication needed
content = post.get_content()

print(f"Content length: {len(content) if content else 0}")

# Check if the post is paywalled
is_paywalled = metadata.get("audience") == "only_paid"
print(f"Paywalled: {is_paywalled}")
# Alternative: Create post with authentication from the start
auth = SubstackAuth(cookies_path="cookies.json")
authenticated_post = Post("https://example.substack.com/p/paywalled-post", auth=auth)
content = authenticated_post.get_content() # Works for both public and paywalled content
```
Loading