webparser

Additional pages

The selected pages are the following two:

https://www.goodreads.com/list/show/3.Best_Science_Fiction_Fantasy_Books

https://www.goodreads.com/list/show/425.Weirdest_Books_Ever

The saved pages are located /src/main/resources/pages/books*.html Might have to do a full save later.

Required for running

Apache Maven
Java 8 JDK

Build the artifact

git clone https://github.com/kozeljko/webparser.git
cd webparser
mvn clean package (be sure to be in the same directory as the pom.xml file)

This will result in two artifacts being built in the /target folder. Use the "webparser-1.0-SNAPSHOT-jar-with-dependencies.jar" artifact. It's a fat jar and it includes all dependencies it needs to run.

Run the artifact

The methods are run with the following command:

java -jar ./target/webparser-1.0-SNAPSHOT-jar-with-dependencies.jar <method> <fileName|filePath>

Possible methods are:

regex (Runs regex method, expects a fileName that exists in the .jar itself)
xpath (Runs xpath method, expects a fileName that exists in the .jar itself)
regex-via-path (Runs regex method, expects a file path to the location of the page file.)
xpath-via-path (Runs xpath method, expects a file path to the location of the page file.)

Note that both fileName and filePath must respect the limitations of the program. Only pages of type books, jewelry and rtv are allowed. Check the /src/main/resources/pages folder for a list of available pages to be ran with a non-path method.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
implementation		implementation
input		input
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml
report.pdf		report.pdf
webparser.iml		webparser.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

webparser

Additional pages

Required for running

Build the artifact

Run the artifact

About

Uh oh!

Releases

Packages

Languages

kozeljko/webparser

Folders and files

Latest commit

History

Repository files navigation

webparser

Additional pages

Required for running

Build the artifact

Run the artifact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages