Tech-savvy time-saver: Loonycorn's NLP Class

Wednesday, July 22, 2020

Looks like they did this before Python 3 became mainstream..

page - urllib2.urlopen(url).read().decode('utf8')

No sir - first of all, urllib2 won't work with Python3.

Then, depending on the page you're pulling, you might not be happy with utf8.

Better to do :

(in Python 3) of course..

import urlopen.request

req = urlopen( url )

charset = req.info().get_content_charset()

page = req.read().decode(charset)

Wednesday, July 22, 2020