Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| install_cqp_mac [2006/12/08 22:13] – emiliano | install_cqp_mac [2006/12/08 22:32] (current) – emiliano | ||
|---|---|---|---|
| Line 101: | Line 101: | ||
| ===== Installing a corpus ===== | ===== Installing a corpus ===== | ||
| - | If you receive a corpus that is already encoded with cwb-encode (like the demo corpora), you will most probably | + | If you receive a corpus that is already encoded with '' |
| - | * Rename data/ to some thing more interesting (" | + | * Rename |
| - | * Move the renamed data/ directory into /corpora. | + | * Move the renamed |
| - | * Move the content of registry/ into / | + | * Move the content of '' |
| - | Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you /corpora directory: | + | Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you '' |
| < | < | ||
| Line 124: | Line 124: | ||
| </ | </ | ||
| - | Now browse into the / | + | Now browse into the '' |
| < | < | ||
| Line 137: | Line 137: | ||
| </ | </ | ||
| - | Do not touch anything, except the line defining " | + | Do not touch anything, except the line defining " |
| - | In our case, we replace " | + | |
| - | If you go back to the terminal, you will now be able to type the command cqp and use the installed corpus: | + | * Replace '' |
| + | * In our case, we replace '' | ||
| + | * Save the file. | ||
| + | |||
| + | If you go back to the terminal, you will now be able to type the command | ||
| < | < | ||
| Line 156: | Line 159: | ||
| 4369: und resounded through the < | 4369: und resounded through the < | ||
| | | ||
| - | [...] | + | [...] |
| </ | </ | ||
| + | |||
| ===== Encoding a corpus ===== | ===== Encoding a corpus ===== | ||
| Line 167: | Line 171: | ||
| Make sure your corpus is formatted one token per line, as indicated in the " | Make sure your corpus is formatted one token per line, as indicated in the " | ||
| - | If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive | + | If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive, e.g. using something like: |
| - | gzip -c *.txt > newcorpus.gz). | + | |
| + | | ||
| ==== Import " | ==== Import " | ||
| - | Create a directory / | + | Create a directory |
| - | Browse to the directory where your corpus | + | Browse to the directory where your corpus-files are stored. |
| <code bash> | <code bash> | ||
| Line 180: | Line 185: | ||
| </ | </ | ||
| - | Issue the cwb-encode command, remembering that your encoded data will " | + | Issue the '' |
| - | You will also have to define the "-P" | + | You will also have to define the '' |
| - | characteristics of newcorpus. We are using a simple example: | + | |
| <code bash> | <code bash> | ||
| Line 193: | Line 197: | ||
| ==== Index " | ==== Index " | ||
| - | If you gotten this far, then you're almost done. You are just missing the indexes for cqp to be able to use the imported data. You will need to issue | + | If you' |
| - | just one command: cwb-makeall -V NEWCORPUS. Beware: type " | + | |
| + | You are just missing the indexes for '' | ||
| + | |||
| + | You will need to issue just one command: | ||
| + | |||
| + | Beware: type " | ||
| <code bash> | <code bash> | ||