Figure out how to automatically scrape

The YorkU website is literally a clusterfuck for scraping, but it would be really awesome if we could automatically do it. I'm not even sure if this is completely possible due to the absurd html layout and the fact that the urls don't make any sense.

Accounting - https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/20/wo/2Ut0tG0DUArPP653ACehWw/1.1.10.7
Biology -https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/20/wo/2Ut0tG0DUArPP653ACehWw/1.1.10.7

Notice they're the same url! WTF!

Also I think it's putting cookies in the url because these urls will expire after a short while.

Anyway the html soup can be dealt with it's the url structure not making any sense that worries me. The structure we would want would be something like

https://www.yorku.ca/courses/2014-15/{Term}/{Subject}

but I guess that would make too much sense.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Figure out how to automatically scrape #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Figure out how to automatically scrape #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions