Skip to content
This repository was archived by the owner on Jan 25, 2022. It is now read-only.

Conversation

@shreydesai
Copy link
Contributor

Version 0.1

Fixes #1100 - implements a Glassdoor feature that uses the official Glassdoor API to retrieve the name, website, and logo of the company. Currently, the website is being used for the token's description, but I can try to implement some text search algorithm later on to find and parse company descriptions on the website.

.gitignore Outdated
# Python #
##########
/ajax/scraper/venv
/ajax/scraper/__pycache__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any time we'd want to include a __pycache__ file in the repository? That is, I think it's okay to make this a global ignore rather than just the one in that directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, __pycache__ folders should be ignored entirely. It could be a global ignore.

@wogsland
Copy link
Contributor

Some js tests appear to be failing:

2 failing

  1. recruiting-token.js "before all" hook:
    ReferenceError: Date is not defined
    at Object.convert (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/EventInit.js:32:24)
    at MutationEventImpl.EventImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/events/Event-impl.js:8:48)
    at MutationEventImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/events/MutationEvent-impl.js:5:1)
    at Object.setup (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/MutationEvent.js:164:17)
    at Object.createImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/MutationEvent.js:151:10)
    at DocumentImpl.createEvent (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:564:31)
    at DocumentImpl.insertBefore (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Node-impl.js:236:18)
    at DocumentImpl.appendChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Node-impl.js:380:17)
    at DocumentImpl.appendChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:367:18)
    at setChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:266:18)
    at HtmlToDom._parseWithparse5v1 (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:90:7)
    at HtmlToDom.appendHtmlToDocument (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:47:48)
    at setInnerHTML (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:52:27)
    at DocumentImpl.write (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:420:7)
    at Document.write (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/Document.js:307:51)
    at Object.exports.jsdom (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom.js:116:21)
    at processHTML (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom.js:252:26)
    at Object.exports.env.exports.jsdom.env as env
    at Context. (/Library/WebServer/Documents/GiftBox/node_modules/mocha-jsdom/index.js:52:22)

  2. recruiting-token.js "before all" hook:
    ReferenceError: Date is not defined
    at Object.convert (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/EventInit.js:32:24)
    at MutationEventImpl.EventImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/events/Event-impl.js:8:48)
    at MutationEventImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/events/MutationEvent-impl.js:5:1)
    at Object.setup (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/MutationEvent.js:164:17)
    at Object.createImpl (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/MutationEvent.js:151:10)
    at DocumentImpl.createEvent (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:564:31)
    at DocumentImpl.insertBefore (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Node-impl.js:236:18)
    at DocumentImpl.appendChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Node-impl.js:380:17)
    at DocumentImpl.appendChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:367:18)
    at setChild (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:266:18)
    at HtmlToDom._parseWithparse5v1 (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:90:7)
    at HtmlToDom.appendHtmlToDocument (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/browser/htmltodom.js:47:48)
    at setInnerHTML (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:52:27)
    at DocumentImpl.write (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/nodes/Document-impl.js:420:7)
    at Document.write (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom/living/generated/Document.js:307:51)
    at Object.exports.jsdom (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom.js:116:21)
    at processHTML (/Library/WebServer/Documents/GiftBox/node_modules/jsdom/lib/jsdom.js:252:26)
    at Object.exports.env.exports.jsdom.env as env
    at Context. (/Library/WebServer/Documents/GiftBox/node_modules/mocha-jsdom/index.js:52:22)

npm ERR! Darwin 15.5.0
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "run" "test"
npm ERR! node v6.2.2
npm ERR! npm v3.10.3
npm ERR! code ELIFECYCLE
npm ERR! Sizzle.IO@0.0.0 test: cd js && ../node_modules/.bin/mocha --require test/bootstrap.js test
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the Sizzle.IO@0.0.0 test script 'cd js && ../node_modules/.bin/mocha --require test/bootstrap.js test'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the Sizzle.IO package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! cd js && ../node_modules/.bin/mocha --require test/bootstrap.js test
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs Sizzle.IO
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls Sizzle.IO
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR! /Library/WebServer/Documents/GiftBox/npm-debug.log

@wogsland
Copy link
Contributor

The images found by the scraper are not replacing the old images when saved.

$success = false;
$data = null;

function generate_key($length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same function as in linkedin-scraper.php...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I import functions from another file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. So, if you look in the src directory that's were the PHP functions are. In classes. These classes are autoloaded by PHP as they're needed, so PHP doesn't process any extraneous code it doesn't need to. The autoloader knows where to look for a file by the namespace of the class. The base of the namespace is Sizzle, which corresponds to the src directory itself. I created you a stub (Scraper.php) to fill out. You can reference the class by it's full name Sizzle\Scraper or put a use statement at the top of any file where you want to just refer to it as Scraper.

@shreydesai
Copy link
Contributor Author

Version 0.2

  • Creates a Scraper class and implements it in glassdoor-scraper.php
  • Combines both the LinkedIn and Glassdoor run scripts in run.sh
  • Removed all console.log debugging lines
  • Fixed NPM testing problem by removing an unnecessary Sinon import
  • New downloaded images will replace the old images when saved

@wogsland
Copy link
Contributor

1). Still not replacing existing images.
2). Values, videos & social media not being replaced.
3). PHP unit tests failing:

There were 2 failures:

1) Sizzle\Tests\Ajax\LinkedInScraperTest::testAjaxSuccess
Failed asserting that false is true.

/Library/WebServer/Documents/GiftBox/src/Tests/Ajax/LinkedInScraperTest.php:45

2) Sizzle\Tests\Ajax\LinkedInScraperTest::testAjaxFailureURL
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-''
+'null'

/Library/WebServer/Documents/GiftBox/src/Tests/Ajax/LinkedInScraperTest.php:73

FAILURES!
Tests: 199, Assertions: 1424, Failures: 2, Incomplete: 16.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants