Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,67 @@ The user provides a collection of documents for topic analysis and the service w
consists of a collection of words that represent a given topic.


All services have been tested to work with python3.6.



## User Guide

Please look at the [user guide](docs/USERGUIDE.md) for a detailed spec of the services and how to use the services.



## Running the service locally

### Install preprequisites

```
pip install -r requirements.txt
```


### Setup

Run the following commands to generate gRPC classes for Python

```
python3.6 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. service_spec/topic_analysis.proto
```



### Running unit tests


```
python3.6 test_topic_analysis_grpc.py
```

### Usage

To start the gRPC server locally

```
python3.6 topic_analysis_grpc.py

```

Topic analysis would most likely involve running a dimensionality reduction alogrithm which would take a considerable lenght of time to complete. For this reason, running the service above would return a handle which you will need to use to query a restapi endpoint at a later time. See the user guide for details. You can start that endpoint with the command

```
python3.6 analysis_results.py
```

You can also use a suitable application server for python flask. A sample config file using gunicorn is [config.py](Docker/gunicorn/config.py).

Then, you can execute the below command to serve analysis_results.py, by executing gnunicorn in a folder containing config.py, while the configuration file confi.py has a path pointing to analysis_results.py.

```
gunicorn -c config.py analysis_results:app
```



## Resources

LSA:
Expand All @@ -31,4 +92,15 @@ LDA:
* [Wikipedia entry](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)

LDA2vec:
* Research paper: [Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec](https://arxiv.org/abs/1605.02019)
* Research paper: [Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec](https://arxiv.org/abs/1605.02019)


## Contributors

### Authors

* Eyob Yirdaw

### Maintainers

* Eyob Yirdaw
22 changes: 22 additions & 0 deletions docs/USERGUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[![SingnetLogo](../docs/assets/singnet-logo.jpg?raw=true 'SingularityNET')](https://singularitynet.io/)



Below is available list of topic analysis methods that have been implemented so far.


## Probabilistic Latent Semantic Analysis (PLSA)

Existing parameters are

- **docs**: a collection of documents, either supplied as an array of strings or a collection of .txt files. At least two documents should be given. Note that a single document can consist of a single sentence. In this way
you can extract topics from a single text file by first tokenizing it at the sentence level and treating each sentence as a single "document".
- **num_topics**: The number of topics to extract. The minimum value is 2.
- **topic_divider**: This value is an integer value. If it has a value of zero, then num_topics would be used. If it has a positive value, then the number of topics would be
the `(number of documents)/topic_divider`
- **maxiter**: This value gives the maximum EM (expectation maximum algorithm) iteration that is allowed. Although a suitable value depends on the supplied documents and the value of beta, a good value can be 22.
- **beta**: This is a floating point number with range of (0,1]. It is used as a tempering parameter in the EM algorithm. A recommended values are values closer to 1 including 1 itself. Choosing the value of 1 means no tempering is used.


Here are sample json files to try with this algorithm, either from the snet client or the dApp: [longer sample](../docs/tests/topic_analysis.json), [shorter sample](../docs/tests/topic_analysis_2.json) .

Loading