Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
install_cqp_mac [2006/12/08 17:59] – created eros | install_cqp_mac [2006/12/08 22:32] (current) – emiliano | ||
---|---|---|---|
Line 18: | Line 18: | ||
===== Installation ===== | ===== Installation ===== | ||
- | To install, copy the appropriate archive for Mac OS X into you Desktop folder. Expand the archive (double-click on the Finder, or use the commands gunzip and tar on it. | + | To install, copy the appropriate archive for Mac OS X into you Desktop folder. Expand the archive (double-click on the Finder, or use the commands |
- | You will then have a new directory on your Desktop named "cwb-< | + | You will then have a new directory on your Desktop named '' |
- | subdirectories, | + | |
< | < | ||
Line 54: | Line 53: | ||
</ | </ | ||
- | Using Terminal.app, | + | Using Terminal.app, |
- | your system, i.e. into /usr/local/ (make sure you don't substitute any existing | + | |
- | directories or files, just add them, WARNING: if you are not familiar with the | + | |
- | UNIX environment that is in you MAC OS X system, do not try this!!!): | + | |
< | < | ||
Line 70: | Line 66: | ||
</ | </ | ||
- | Now you will be able to type all of CWB's commands on your terminal, including | + | Now you will be able to type all of CWB's commands on your terminal, including the man pages for '' |
- | the man pages for cqp and cwb-encode. | + | |
- | To find a corpus, CQP uses an environment variable $CORPUS_REGISTRY. This has to | + | To find a corpus, CQP uses an environment variable |
- | point to a directory registry/ where the corpora on your system are defined. | + | In theory, |
- | In theory, registry/ could be located anywhere, but in my experience it is | + | |
- | better to create the following directory tree in / (root): | + | |
< | < | ||
Line 86: | Line 79: | ||
Then you must set your environment, | Then you must set your environment, | ||
- | ((If you try putting your registry/ directory elsewhere, it will work smoothly until you try to use cwb-encode with a new corpus... at that point, you will be told by cqp that / | + | ((If you try putting your '' |
+ | |||
+ | This is what happened to me after putting my registry in '' | ||
< | < | ||
Line 95: | Line 90: | ||
After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn' | After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn' | ||
- | * if you use the TCSH shell:\\ setenv CORPUS_REGISTRY "/ | + | * if you use the TCSH shell: |
- | * if you use the BASH shell:\\ export CORPUS_REGISTRY="/ | + | |
+ | | ||
+ | |||
+ | * if you use the BASH shell: | ||
+ | |||
+ | | ||
===== Installing a corpus ===== | ===== Installing a corpus ===== | ||
- | If you receive a corpus that is already encoded with cwb-encode (like the demo corpora), you will most probably | + | If you receive a corpus that is already encoded with '' |
- | * Rename data/ to some thing more interesting (" | + | * Rename |
- | * Move the renamed data/ directory into /corpora. | + | * Move the renamed |
- | * Move the content of registry/ into / | + | * Move the content of '' |
- | Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you /corpora directory: | + | Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you '' |
< | < | ||
Line 123: | Line 124: | ||
</ | </ | ||
- | Now browse into the / | + | Now browse into the '' |
< | < | ||
Line 136: | Line 137: | ||
</ | </ | ||
- | Do not touch anything, except the line defining " | + | Do not touch anything, except the line defining " |
- | In our case, we replace " | + | |
- | If you go back to the terminal, you will now be able to type the command cqp and use the installed corpus: | + | * Replace '' |
+ | * In our case, we replace '' | ||
+ | * Save the file. | ||
+ | |||
+ | If you go back to the terminal, you will now be able to type the command | ||
< | < | ||
Line 155: | Line 159: | ||
4369: und resounded through the < | 4369: und resounded through the < | ||
| | ||
- | [...] | + | [...] |
</ | </ | ||
+ | |||
===== Encoding a corpus ===== | ===== Encoding a corpus ===== | ||
Line 166: | Line 171: | ||
Make sure your corpus is formatted one token per line, as indicated in the " | Make sure your corpus is formatted one token per line, as indicated in the " | ||
- | If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive | + | If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive, e.g. using something like: |
- | gzip -c *.txt > newcorpus.gz). | + | |
+ | | ||
==== Import " | ==== Import " | ||
- | Create a directory / | + | Create a directory |
- | Browse to the directory where your corpus | + | Browse to the directory where your corpus-files are stored. |
<code bash> | <code bash> | ||
Line 179: | Line 185: | ||
</ | </ | ||
- | Issue the cwb-encode command, remembering that your encoded data will " | + | Issue the '' |
- | You will also have to define the "-P" | + | You will also have to define the '' |
- | characteristics of newcorpus. We are using a simple example: | + | |
<code bash> | <code bash> | ||
Line 192: | Line 197: | ||
==== Index " | ==== Index " | ||
- | If you gotten this far, then you're almost done. You are just missing the indexes for cqp to be able to use the imported data. You will need to issue | + | If you' |
- | just one command: cwb-makeall -V NEWCORPUS. Beware: type " | + | |
+ | You are just missing the indexes for '' | ||
+ | |||
+ | You will need to issue just one command: | ||
+ | |||
+ | Beware: type " | ||
<code bash> | <code bash> |