install_cqp_mac

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
install_cqp_mac [2006/12/08 17:59] – created erosinstall_cqp_mac [2006/12/08 22:32] (current) emiliano
Line 18: Line 18:
 ===== Installation ===== ===== Installation =====
  
-To install, copy the appropriate archive for Mac OS X into you Desktop folder. Expand the archive (double-click on the Finder, or use the commands gunzip and tar on it.+To install, copy the appropriate archive for Mac OS X into you Desktop folder. Expand the archive (double-click on the Finder, or use the commands ''gunzip'' and ''tar'' on it).
  
-You will then have a new directory on your Desktop named "cwb-<version number>". If you browse that folder you will see that it contains a number of  +You will then have a new directory on your Desktop named ''cwb-<version number>''. If you browse that folder you will see that it contains a number of subdirectories, each one having some subdirectories and binary files:
-subdirectories, each one having some subdirectories and binary files:+
  
 <code> <code>
Line 54: Line 53:
 </code> </code>
  
-Using Terminal.app, move all these files into the corresponding directories in  +Using Terminal.app, move all these files into the corresponding directories in your system, i.e. into ''/usr/local/'' (make sure you don't substitute any existing directories or files, just add them, WARNING: if you are not familiar with the UNIX environment that is in you MAC OS X system, do not try this!!!):
-your system, i.e. into /usr/local/ (make sure you don't substitute any existing +
-directories or files, just add them, WARNING: if you are not familiar with the  +
-UNIX environment that is in you MAC OS X system, do not try this!!!):+
  
 <code> <code>
Line 70: Line 66:
 </code> </code>
  
-Now you will be able to type all of CWB's commands on your terminal, including  +Now you will be able to type all of CWB's commands on your terminal, including the man pages for ''cqp'' and ''cwb-encode''.
-the man pages for cqp and cwb-encode.+
  
-To find a corpus, CQP uses an environment variable $CORPUS_REGISTRY. This has to +To find a corpus, CQP uses an environment variable ''$CORPUS_REGISTRY''. This has to point to a directory ''registry/'' where the corpora on your system are defined. 
-point to a directory registry/ where the corpora on your system are defined. +In theory, ''registry/'' could be located anywhere, but in my experience it is better to create the following directory tree in / (root):
-In theory, registry/ could be located anywhere, but in my experience it is  +
-better to create the following directory tree in / (root):+
  
 <code> <code>
Line 86: Line 79:
  
 Then you must set your environment, using one of the following commands: Then you must set your environment, using one of the following commands:
-((If you try putting your registry/ directory elsewhere, it will work smoothly until you try to use cwb-encode with a new corpus... at that point, you will be told by cqp that /corpora/c1/registry are needed – this is probably a bug. This is what happened to me after putting my registry in ~/corpora/registry and trying to encode a corpus with cwb-encode:+((If you try putting your ''registry/'' directory elsewhere, it will work smoothly until you try to use ''cwb-encode'' with a new corpus... at that point, you will be told by cqp that ''/corpora/c1/registry'' is needed – this is probably a bug. 
 + 
 +This is what happened to me after putting my registry in ''~/corpora/registry'' and trying to encode a corpus with ''cwb-encode'':
  
 <code> <code>
Line 95: Line 90:
 After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn't have these problems.)) After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn't have these problems.))
  
-  * if you use the TCSH shell:\\ setenv CORPUS_REGISTRY "/corpora/c1/registry" +  * if you use the TCSH shell: 
-  * if you use the BASH shell:\\ export CORPUS_REGISTRY="/corpora/c1/registry"+ 
 +    setenv CORPUS_REGISTRY "/corpora/c1/registry" 
 + 
 +  * if you use the BASH shell: 
 + 
 +    export CORPUS_REGISTRY="/corpora/c1/registry" 
  
 ===== Installing a corpus ===== ===== Installing a corpus =====
  
-If you receive a corpus that is already encoded with cwb-encode (like the demo corpora), you will most probably receive an archive containing a data/ directory and a registry/ directory.+If you receive a corpus that is already encoded with ''cwb-encode'' (like the demo corpora), you will most probably have an archive containing a ''data/'' directory and a ''registry/'' directory.
  
-  * Rename data/ to some thing more interesting ("dickens", "german-law", whatever makes you happy).  +  * Rename ''data/'' to some thing more interesting ("dickens", "german-law", whatever makes you happy).  
-  * Move the renamed data/ directory into /corpora. +  * Move the renamed ''data/'' directory into ''/corpora''
-  * Move the content of registry/ into /corpora/c1/registry (just one text file, containing the information that cqp needs to use the corpus).+  * Move the content of ''registry/'' into ''/corpora/c1/registry'' (just one text file, containing the information that ''cqp'' needs to use the corpus).
  
-Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you /corpora directory:+Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you ''/corpora'' directory:
  
 <code> <code>
Line 123: Line 124:
 </code> </code>
  
-Now browse into the /corpora/c1/registry directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):+Now browse into the ''/corpora/c1/registry'' directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):
  
 <code> <code>
Line 136: Line 137:
 </code> </code>
  
-Do not touch anything, except the line defining "HOME". Replace "data" with the path to the new directory containing the data in /corpora. +Do not touch anything, except the line defining "HOME".
-In our case, we replace "data" with "/corpora/dickens/ . Save the file.+
  
-If you go back to the terminal, you will now be able to type the command cqp and use the installed corpus:+  * Replace ''data'' with the path to the new directory containing the data in ''/corpora''
 +  * In our case, we replace ''data'' with ''/corpora/dickens/''
 +  * Save the file. 
 + 
 +If you go back to the terminal, you will now be able to type the command ''cqp'' and use the installed corpus:
  
 <code> <code>
Line 155: Line 159:
      4369: und resounded through the <house> like thunder . Every roo      4369: und resounded through the <house> like thunder . Every roo
      5087:  so did every bell in the <house> . This might have lasted      5087:  so did every bell in the <house> . This might have lasted
-[...]+     [...]
 </code> </code>
 +
  
 ===== Encoding a corpus ===== ===== Encoding a corpus =====
Line 166: Line 171:
 Make sure your corpus is formatted one token per line, as indicated in the "corpus encoding tutorial", eventually with additional columns for positional attributes (POS, LEMMA, etc.). Make sure your corpus is formatted one token per line, as indicated in the "corpus encoding tutorial", eventually with additional columns for positional attributes (POS, LEMMA, etc.).
  
-If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive (e.g. using something like  +If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archivee.g. using something like: 
-gzip -c *.txt > newcorpus.gz).+ 
 +    gzip -c *.txt > newcorpus.gz
  
 ==== Import "newcorpus" ==== ==== Import "newcorpus" ====
  
-Create a directory /corpora/newcorpus (substitute "newcorpus" with your corpus's name...).+Create a directory ''/corpora/newcorpus'' (substitute "newcorpus" with your corpus's name...).
  
-Browse to the directory where your corpus is stored.+Browse to the directory where your corpus-files are stored.
  
 <code bash> <code bash>
Line 179: Line 185:
 </code> </code>
  
-Issue the cwb-encode command, remembering that your encoded data will "live" in /corpora/newcorpus, and that the registry file for newcorpus will have to be saved under /corpora/c1/registry .+Issue the ''cwb-encode'' command, remembering that your encoded data will "live" in ''/corpora/newcorpus'', and that the registry file for newcorpus will have to be saved as ''/corpora/c1/registry/newcorpus''.
  
-You will also have to define the "-Pand "-Sflags according to the  +You will also have to define the ''-P'' and ''-S'' flags according to the characteristics of newcorpus. We are using a simple example:
-characteristics of newcorpus. We are using a simple example:+
  
 <code bash> <code bash>
Line 192: Line 197:
 ==== Index "newcorpus" ==== ==== Index "newcorpus" ====
  
-If you gotten this far, then you're almost done. You are just missing the indexes for cqp to be able to use the imported data. You will need to issue +If you've gotten this far, then you're almost done. 
-just one command: cwb-makeall -V NEWCORPUS. Beware: type "newcorpus" in uppercase... I had errors with typing lowercase.+ 
 +You are just missing the indexes for ''cqp'' to be able to use the imported data. 
 + 
 +You will need to issue just one command: ''cwb-makeall -V NEWCORPUS''. 
 + 
 +Beware: type "newcorpus" in uppercase... that is how ''cwb'' likes it... I had errors with typing it lowercase.
  
 <code bash> <code bash>
  • install_cqp_mac.1165597144.txt.gz
  • Last modified: 2006/12/08 17:59
  • by eros