Show pageOld revisionsBacklinksBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Looking at KWIC lines with CQP ====== by Marco Baroni ===== The First Session ===== Invoke CQP (press enter/return key after this and all other commands): <code>cqp -e</code> or: <code>cqp -eC</code> Exit from CQP: <code>exit</code> (If you did that, please enter again!) <code>show corpora;</code> While in CQP, keep in mind that some things work like on the Unix terminal -- in particular, you can recall previous commands with the upwards-pointing arrow, and you navigate the kwic results with more/less-like syntax (space to move to next page, q to quit, etc.) Select corpus (remember the semi-colon at the end of each command), e.g.: <code>BNCV4; REPUBBLICA; etc. </code> (Notice that tab completion works for corpus names.) Let's stick to the BNC, for now. A quick way to know how many tokens there are in a corpus: <code>info;</code> Simple kwic: <code>"food"; "food" %c; "good" "food";</code> If you have problems seeing accented characters (as in vowels with umlaut in German or with accents in Italian and Spanish), try: <code>set Pager more;</code> You can move through the kwic results like with a standard Unix pager: **space** to see next page, **b** to go back one page, **q** to exit kwic display. Whenever **q** does not work, use **ctrl+c** to interrupt any command. To see the frequency of occurrence of your last query: <code>size Last;</code> If you have too many results, it is a good idea to take a look at a random sample... First, "save" query into a variable: <code>A = "often";</code> Then, "reduce" A to the desired number of randomly selected contexts, e.g.: <code>reduce A to 20;</code> Finally, take a look at these contexts: <code>cat A;</code> Change context size: <code>set Context 60; set Context 5 words; set Context s; set Context 3 s; set Context default;</code> Other visualization options: <code>show +pos; show +lemma; show -pos -lemma; show -cpos; set PrintStructures text_domain; set PrintStructures "";</code> ===== Exploiting morpho-syntactic annotation ===== Doing queries using morphosyntactic annotation (if you've been experimenting with show and set, now it's a good moment to go back to a normal-looking kwic-display): <code>[word = "obsessive"] [pos = "NN.*"]; [word = "obsessive" %c] [pos = "NN.*"]; [word = "cause"]; [lemma = "cause"]; [lemma = "cause" & pos = "VV.*"];</code> <file>Practice time: - look for candidate N+N compounds in Italian with "donna" as head (at the lemma level) - select a random sample of 100 hits </file> No need to try the following now, but here is how you can extract a frequency list for a collocate extracted from a "flexible" context (from the BNC, in this specific case): <code>[lemma = "cause" & pos = "VV.*"][pos="AT0"]?[pos="AJ.*"]*[pos="NN.*"]; count by lemma on matchend;</code> The former is something you **cannot** do with cwb-scan-corpus. ===== Regular Expressions ===== Nouns ending in izzazione (in la Repubblica): <code>A = [lemma = ".*izzazione" & pos = "NOUN"];</code> ===== Structural attributes ===== The lemma "opportunist" used by women and men in the BNC: <code>[lemma="opportunist"] :: match.text_author_sex="Female"; [lemma="opportunist"] :: match.text_author_sex="Male";</code> ===== Saving results ===== Save the results to an output file: <code>cat Last > "myconc.txt";</code> or, if you saved results in a variable: <code>cat A > "myconc.txt";</code> <file>Practice: - save 100 random concordance lines for the donna+NOUN pattern in an external text file, with extended context (e.g., a 3 sentence window)</file> ===== Useful links ===== ==== Stefan Evert's CQP tutorial ==== [[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/html]] \\ [[http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPTutorial/cqp-tutorial.pdf]] ==== Some Web-based interfaces ==== Serge Sharoff's Internet Corpora: [[http://corpus.leeds.ac.uk/internet.html]] CucWeb: [[http://ramsesii.upf.es/cgi-bin/cucweb/search-form.pl?lang=en_US]] SSLMITDev: [[http://sslmitdev-online.sslmit.unibo.it/corpora/corpora.php]] The Word Sketch Engine uses a syntax that is almost identical to the one of CQP! tutorial_cqp.txt Last modified: 2008/04/24 12:43by eros