Differences

This shows you the differences between two versions of the page.

--- install_cqp_mac [2006/12/08 22:13] – emiliano
+++ install_cqp_mac [2006/12/08 22:32] (current) – emiliano
@@ Line 101: / Line 101: @@
 ===== Installing a corpus =====
-If you receive a corpus that is already encoded with cwb-encode (like the demo corpora), you will most probably receive an archive containing a data/ directory and a registry/ directory.
+If you receive a corpus that is already encoded with ''cwb-encode'' (like the demo corpora), you will most probably have an archive containing a ''data/'' directory and a ''registry/'' directory.
-  * Rename data/ to some thing more interesting ("dickens", "german-law", whatever makes you happy).
+  * Rename ''data/'' to some thing more interesting ("dickens", "german-law", whatever makes you happy).
-  * Move the renamed data/ directory into /corpora.
+  * Move the renamed ''data/'' directory into ''/corpora''.
-  * Move the content of registry/ into /corpora/c1/registry (just one text file, containing the information that cqp needs to use the corpus).
+  * Move the content of ''registry/'' into ''/corpora/c1/registry'' (just one text file, containing the information that ''cqp'' needs to use the corpus).
-Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you /corpora directory:
+Let's say you are installing the DICKENS demo corpus. Let's say you now have the following situation in you ''/corpora'' directory:
 <code>
@@ Line 124: / Line 124: @@
 </code>
-Now browse into the /corpora/c1/registry directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):
+Now browse into the ''/corpora/c1/registry'' directory and open the "dickens" file you just moved into it. The file's contents will include the following lines (just an example, it includes much more...):
 <code>
@@ Line 137: / Line 137: @@
 </code>
-Do not touch anything, except the line defining "HOME". Replace "data" with the path to the new directory containing the data in /corpora.
+Do not touch anything, except the line defining "HOME".
-In our case, we replace "data" with "/corpora/dickens/ . Save the file.
-If you go back to the terminal, you will now be able to type the command cqp and use the installed corpus:
+  * Replace ''data'' with the path to the new directory containing the data in ''/corpora''.
+  * In our case, we replace ''data'' with ''/corpora/dickens/''.
+  * Save the file.
+If you go back to the terminal, you will now be able to type the command ''cqp'' and use the installed corpus:
 <code>
@@ Line 156: / Line 159: @@
 : und resounded through the <house> like thunder . Every roo
 :  so did every bell in the <house> . This might have lasted
-[...]
+     [...]
 </code>
 ===== Encoding a corpus =====
@@ Line 167: / Line 171: @@
 Make sure your corpus is formatted one token per line, as indicated in the "corpus encoding tutorial", eventually with additional columns for positional attributes (POS, LEMMA, etc.).
-If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive (e.g. using something like
+If your corpus counts more than one file, it is advisable that you put all the files together in just one gzipped archive, e.g. using something like:
-gzip -c *.txt > newcorpus.gz).
+    gzip -c *.txt > newcorpus.gz
 ==== Import "newcorpus" ====
-Create a directory /corpora/newcorpus (substitute "newcorpus" with your corpus's name...).
+Create a directory ''/corpora/newcorpus'' (substitute "newcorpus" with your corpus's name...).
-Browse to the directory where your corpus is stored.
+Browse to the directory where your corpus-files are stored.
 <code bash>
@@ Line 180: / Line 185: @@
 </code>
-Issue the cwb-encode command, remembering that your encoded data will "live" in /corpora/newcorpus, and that the registry file for newcorpus will have to be saved under /corpora/c1/registry .
+Issue the ''cwb-encode'' command, remembering that your encoded data will "live" in ''/corpora/newcorpus'', and that the registry file for newcorpus will have to be saved as ''/corpora/c1/registry/newcorpus''.
-You will also have to define the "-P" and "-S" flags according to the
+You will also have to define the ''-P'' and ''-S'' flags according to the characteristics of newcorpus. We are using a simple example:
-characteristics of newcorpus. We are using a simple example:
 <code bash>
@@ Line 193: / Line 197: @@
 ==== Index "newcorpus" ====
-If you gotten this far, then you're almost done. You are just missing the indexes for cqp to be able to use the imported data. You will need to issue
+If you've gotten this far, then you're almost done.
-just one command: cwb-makeall -V NEWCORPUS. Beware: type "newcorpus" in uppercase... I had errors with typing lowercase.
+You are just missing the indexes for ''cqp'' to be able to use the imported data.
+You will need to issue just one command: ''cwb-makeall -V NEWCORPUS''.
+Beware: type "newcorpus" in uppercase... that is how ''cwb'' likes it... I had errors with typing it lowercase.
 <code bash>