install_cqp_mac

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
install_cqp_mac [2006/12/08 22:12] emilianoinstall_cqp_mac [2006/12/08 22:32] (current) emiliano
Line 79: Line 79:
  
 Then you must set your environment, using one of the following commands: Then you must set your environment, using one of the following commands:
-((If you try putting your ''registry/'' directory elsewhere, it will work smoothly until you try to use ''cwb-encode'' with a new corpus... at that point, you will be told by cqp that ''/corpora/c1/registry'' is needed – this is probably a bug. This is what happened to me after putting my registry in ~/corpora/registry and trying to encode a corpus with cwb-encode:+((If you try putting your ''registry/'' directory elsewhere, it will work smoothly until you try to use ''cwb-encode'' with a new corpus... at that point, you will be told by cqp that ''/corpora/c1/registry'' is needed – this is probably a bug. 
 + 
 +This is what happened to me after putting my registry in ''~/corpora/registry'' and trying to encode a corpus with ''cwb-encode'':
  
 <code> <code>
Line 88: Line 90:
 After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn't have these problems.)) After that message, I had to move everything to / (root). If you follow the instructions above, you shouldn't have these problems.))
  
-  * if you use the TCSH shell:\\ setenv CORPUS_REGISTRY "/corpora/c1/registry" +  * if you use the TCSH shell: 
-  * if you use the BASH shell:\\ export CORPUS_REGISTRY="/corpora/c1/registry"+ 
 +    setenv CORPUS_REGISTRY "/corpora/c1/registry" 
 + 
 +  * if you use the BASH shell: 
 + 
 +    export CORPUS_REGISTRY="/corpora/c1/registry" 
  
 ===== Installing a corpus ===== ===== Installing a corpus =====
  
-If you receive a corpus that is already encoded with cwb-encode (like the demo corpora), you will most probably receive an archive containing a data/ directory and a registry/ directory.+If you receive a corpus that is already encoded with ''cwb-encode'' (like the demo corpora), you will most probably have an archive containing a ''data/'' directory and a ''registry/'' directory.
  
-  * Rename data/ to some thing more interesting ("dickens", "german-law", whatever makes you happy).  +  * Rename ''data/'' to some thing more interesting ("dickens", "german-law", whatever makes you happy).  
-  * Move the renamed data/ directory into /corpora. +  * Move the renamed ''data/'' directory into ''/corpora''
-  * Move the content of registry/ into /corpora/c1/registry (just one text file, containing the information that cqp needs to use the corpus).+  * Move the content of ''registry/'' into ''/corpora/c1/registry'' (just one text file, containing the information that ''cqp'' needs to use the corpus).
  
-Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you /corpora directory:+Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you ''/corpora'' directory:
  
 <code> <code>
Line 116: Line 124:
 </code> </code>
  
-Now browse into the /corpora/c1/registry directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):+Now browse into the ''/corpora/c1/registry'' directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):
  
 <code> <code>
Line 129: Line 137:
 </code> </code>
  
-Do not touch anything, except the line defining "HOME". Replace "data" with the path to the new directory containing the data in /corpora. +Do not touch anything, except the line defining "HOME".
-In our case, we replace "data" with "/corpora/dickens/ . Save the file.+
  
-If you go back to the terminal, you will now be able to type the command cqp and use the installed corpus:+  * Replace ''data'' with the path to the new directory containing the data in ''/corpora''
 +  * In our case, we replace ''data'' with ''/corpora/dickens/''
 +  * Save the file. 
 + 
 +If you go back to the terminal, you will now be able to type the command ''cqp'' and use the installed corpus:
  
 <code> <code>
Line 148: Line 159:
      4369: und resounded through the <house> like thunder . Every roo      4369: und resounded through the <house> like thunder . Every roo
      5087:  so did every bell in the <house> . This might have lasted      5087:  so did every bell in the <house> . This might have lasted
-[...]+     [...]
 </code> </code>
 +
  
 ===== Encoding a corpus ===== ===== Encoding a corpus =====
Line 159: Line 171:
 Make sure your corpus is formatted one token per line, as indicated in the "corpus encoding tutorial", eventually with additional columns for positional attributes (POS, LEMMA, etc.). Make sure your corpus is formatted one token per line, as indicated in the "corpus encoding tutorial", eventually with additional columns for positional attributes (POS, LEMMA, etc.).
  
-If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive (e.g. using something like  +If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archivee.g. using something like: 
-gzip -c *.txt > newcorpus.gz).+ 
 +    gzip -c *.txt > newcorpus.gz
  
 ==== Import "newcorpus" ==== ==== Import "newcorpus" ====
  
-Create a directory /corpora/newcorpus (substitute "newcorpus" with your corpus's name...).+Create a directory ''/corpora/newcorpus'' (substitute "newcorpus" with your corpus's name...).
  
-Browse to the directory where your corpus is stored.+Browse to the directory where your corpus-files are stored.
  
 <code bash> <code bash>
Line 172: Line 185:
 </code> </code>
  
-Issue the cwb-encode command, remembering that your encoded data will "live" in /corpora/newcorpus, and that the registry file for newcorpus will have to be saved under /corpora/c1/registry .+Issue the ''cwb-encode'' command, remembering that your encoded data will "live" in ''/corpora/newcorpus'', and that the registry file for newcorpus will have to be saved as ''/corpora/c1/registry/newcorpus''.
  
-You will also have to define the "-Pand "-Sflags according to the  +You will also have to define the ''-P'' and ''-S'' flags according to the characteristics of newcorpus. We are using a simple example:
-characteristics of newcorpus. We are using a simple example:+
  
 <code bash> <code bash>
Line 185: Line 197:
 ==== Index "newcorpus" ==== ==== Index "newcorpus" ====
  
-If you gotten this far, then you're almost done. You are just missing the indexes for cqp to be able to use the imported data. You will need to issue +If you've gotten this far, then you're almost done. 
-just one command: cwb-makeall -V NEWCORPUS. Beware: type "newcorpus" in uppercase... I had errors with typing lowercase.+ 
 +You are just missing the indexes for ''cqp'' to be able to use the imported data. 
 + 
 +You will need to issue just one command: ''cwb-makeall -V NEWCORPUS''. 
 + 
 +Beware: type "newcorpus" in uppercase... that is how ''cwb'' likes it... I had errors with typing it lowercase.
  
 <code bash> <code bash>
  • install_cqp_mac.1165612350.txt.gz
  • Last modified: 2006/12/08 22:12
  • by emiliano