Thursday, August 17, 2017

How Will You Find the Longest Words Shakespeare Used (Perl)

  1. Get them text from : https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt
  2. vi ./shakesp.txt    # get rid of the obviously easy stuff - a look through tells you up to what point (sonnets start) you can safely delete :)
  3. perl -n -000 -e 'print unless /^\s*\d+\s*$|ELECTRONIC|^\s*the\s+end\s*$|project\s+gutenberg|etext/i;' ./shakesp.txt > shakeclean.txt
  4. perl -p -e 's/[^a-zA-Z\047]/\n/g; s/(.)/\L$1/g;' shakeclean.txt | perl -nl -e 'chomp; print if length > 13;' | sort | uniq | perl -n -0'' -e '@words = split( "\n" ); @sorted = sort { length $b <=> length $a } @words; $longest = join "\n", @sorted; print "$longest";'
honorificabilitudinitatibus (Love's Labour Lost)
anthropophaginian
indistinguishable
undistinguishable
incomprehensible
superserviceable
circumscription
disproportioned
distemperatures
distinguishment
enfranchisement
excommunication
extraordinarily
flibbertigibbet
impossibilities
indistinguish'd
interchangeably
interchangement
interrogatories
misconstruction
notwithstanding
particularities
perpendicularly
portotartarossa
praeclarissimus
prognostication
superstitiously
transformations
uncompassionate
uncomprehensive
undistinguished
unreconciliable
accommodations
accomplishment
acknowledgment
administration
affectionately
apprenticehood
carnarvonshire
circumstantial
consanguineous
considerations
conspectuities
constantinople
contemptuously
contumeliously
correspondence
counterfeiting
determinations
discomfortable
discontentedly
disparagements
distemperature
entertainments
fortifications
halfpennyworth
handicraftsmen
imperceiverant
inconveniences
insurrection's
intelligencing
inter'gatories
interpretation
leicestershire
mephostophilus
nebuchadnezzar
northumberland
oscorbidulchos
preposterously
principalities
proportionable
reconciliation
sovereignvours
superscription
transformation
unaccommodated
understandings
unpremeditated
unproportion'd
unquestionable
unthankfulness
voluptuousness

No comments: