2020-10-25 13:51:13 +01:00
|
|
|
* P1
|
|
|
|
** TODO Plot word frequencies
|
|
|
|
With gnuplot, with documents of at least 3 different languages.
|
|
|
|
We'll fit this to the Booth and Federowicz equation
|
2020-10-25 19:58:54 +01:00
|
|
|
** DONE Create a table with information of all documents
|
|
|
|
CLOSED: [2020-10-25 Sun 19:58]
|
|
|
|
| filename | type | encoding | language |
|
2020-10-25 22:14:20 +01:00
|
|
|
** DONE Extract all URLs
|
|
|
|
CLOSED: [2020-10-25 Sun 22:14]
|
2020-10-25 23:40:20 +01:00
|
|
|
** DONE Write to a file all word occurrences and frequencies
|
|
|
|
CLOSED: [2020-10-25 Sun 23:40]
|
|
|
|
Sorted in a decreasing manner
|