Wednesday, July 22, 2020

Loonycorn's NLP Class

Looks like they did this before Python 3 became mainstream..

page - urllib2.urlopen(url).read().decode('utf8')

No sir - first of all, urllib2 won't work with Python3.

Then, depending on the page you're pulling, you might not be happy with utf8.

Better to do :

(in Python 3) of course..

import urlopen.request

req = urlopen( url )
charset = req.info().get_content_charset()
page = req.read().decode(charset)

No comments: