-
Notifications
You must be signed in to change notification settings - Fork 0
[1100] Glassdoor scraper #1109
base: develop
Are you sure you want to change the base?
[1100] Glassdoor scraper #1109
Conversation
.gitignore
Outdated
| # Python # | ||
| ########## | ||
| /ajax/scraper/venv | ||
| /ajax/scraper/__pycache__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any time we'd want to include a __pycache__ file in the repository? That is, I think it's okay to make this a global ignore rather than just the one in that directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, __pycache__ folders should be ignored entirely. It could be a global ignore.
|
Some js tests appear to be failing: 2 failing
npm ERR! Darwin 15.5.0 npm ERR! Please include the following file with any support request: |
|
The images found by the scraper are not replacing the old images when saved. |
ajax/glassdoor-scraper.php
Outdated
| $success = false; | ||
| $data = null; | ||
|
|
||
| function generate_key($length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same function as in linkedin-scraper.php...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I import functions from another file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. So, if you look in the src directory that's were the PHP functions are. In classes. These classes are autoloaded by PHP as they're needed, so PHP doesn't process any extraneous code it doesn't need to. The autoloader knows where to look for a file by the namespace of the class. The base of the namespace is Sizzle, which corresponds to the src directory itself. I created you a stub (Scraper.php) to fill out. You can reference the class by it's full name Sizzle\Scraper or put a use statement at the top of any file where you want to just refer to it as Scraper.
Version 0.2
|
|
1). Still not replacing existing images. |
Version 0.1
Fixes #1100 - implements a Glassdoor feature that uses the official Glassdoor API to retrieve the name, website, and logo of the company. Currently, the website is being used for the token's description, but I can try to implement some text search algorithm later on to find and parse company descriptions on the website.