-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The YorkU website is literally a clusterfuck for scraping, but it would be really awesome if we could automatically do it. I'm not even sure if this is completely possible due to the absurd html layout and the fact that the urls don't make any sense.
Accounting - https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/20/wo/2Ut0tG0DUArPP653ACehWw/1.1.10.7
Biology -https://w2prod.sis.yorku.ca/Apps/WebObjects/cdm.woa/20/wo/2Ut0tG0DUArPP653ACehWw/1.1.10.7
Notice they're the same url! WTF!
Also I think it's putting cookies in the url because these urls will expire after a short while.
Anyway the html soup can be dealt with it's the url structure not making any sense that worries me. The structure we would want would be something like
https://www.yorku.ca/courses/2014-15/{Term}/{Subject}
but I guess that would make too much sense.