Do you find it easy to search for example sentences on web? I sometimes do not. Search engines like Google are looking for documents: large pieces for text which contain keywords in a query.
But... queries just on words do not work always well! DepFinder is a search engine which gives you more power, but at the same time a bit more difficult to master than a regular search engine.
What is DepFinder
Well... It is a search engine. It is available on http://lotus.kuee.kyoto-u.ac.jp/depfinder/search. I try to have it online as much as possible, but some there could be some downtime. Total number of available sentences could vary from time to time as well.
And no sources yet, sorry.
Usually, search engines allows to do search only by words. DepFinder is different. It allows to use:
- words
- dependencies
- grammar
- part of speech information
The first one is a usual part, other three are probably not. Let's see how exactly it is different from general search engines.
What can DepFinder do
Here is a collection of interesting queries.
Usage of onomatopoeia
It is difficult to find good examples for adverbs and especially onomatopoeia. But it is not a problem for DepFinder.
Checking if a word can be used in a certain situation
I used this query to find out whether it is possible to say in Japanese "to think about a problem from X sides" using a word 面.
Usage of grammar
- @〜飛ぶ*->度に,@度に->〜飛ぶ* - "verb 度に verb"
- +〜物->度に,@度に->〜飛ぶ* - "noun 度に verb"
What can you do from morning till evening
DepFinder by Example
This section introduces query language, from simple constructs to all its power.
Dependency Query
A most famous Japanese plant is sakura, or 桜. Motion of its petals when falling down from a tree is usually poetically described with the verb 舞い落ちる. Let's see in what situations this "phrase" is used: 桜→舞い落ちる. Search results should have list of sentences like:
- 桜の舞い落ちる速度、秒速5センチメートルなんだって。
- 桜舞い落ちる季節も近いですねえ〜
- 宇都宮B級グルメと桜舞い落ちる神社。
- 時は桜の舞い落ちる4月。
- そして桜は静かに舞い落ちる。
Each of sentences contain both 桜 and 舞い落ちる. But there is more. In each sentence there is a dependency relation between these two words. For the detailed explanation please refer to the linked Wikipedia article. However, in a simple terms, dependency relation is formed between the words that have the strongest connection in a sentence.
In this case, 桜 is called a child and 舞い落ちる is a parent. There could be other words between a child and a parent, like in the sentence "そして桜は静かに舞い落ちる". It is possible to swap our two words in a query, creating an another one: 舞い落ちる→桜. 舞い落ちる becomes a child of 桜 in the search results of this query.
Several dependencies
It is possible to specify several dependencies at the same time: 綺麗な→咲く→桜. They will be processed as if they were on the same level --- sibling children of the last element.
Use arrow symbol (→ or ->) to specify a dependency between two or more words.
Multiple Inclusions
DepFinder takes its raw data from the Internet, and search result can contain sentences which are not "clean". Also, by default, DepFinder tries to match a query as many times as possible. For example, lets search just for sakura: 桜. Results are going to contain sentences which have multiple sakuras in them and it is not very useful. Let's prepend an @ symbol to a query: @桜. DepFinder prefers only single matches of such queries in a sentence. Effectively, it gives a way to control whether you want to have some word to happen one time or maybe more than one time in a sentence.
There is one remark. Most useful queries currently require to append @ symbol to almost all query parts. Future revisions of DepFinder is going to probably reverse the current behavior of @ symbol: don't like duplicates by default and allow them when told.
Use @ symbol to prefer single inclusions of a query.
Grammatical Form Query
Let's return to our sakura. Before its petals fall down, it surely has to bloom: @桜→咲く. Note, that every sentence have only basic form of 咲く. Let's try another form: @桜→咲いてほしい. This time it's only 咲いてほしい! DepFinder matches exact form of a query by default.
To find any grammatical form of a word, add * after the word: @桜→咲いた*. This query have past form of 咲く, but because of *, DepFinder matches any form of 咲く.
The star can be used in forms containing more than one grammatical part as well: @桜→咲いています*. In such cases it modifies only last grammatical part. In current example it was ます and its possible forms could be ました or ません.
Find exact grammatical form by default. Append * to find any grammatical form.
Part of Speech Query
Are you already bored of sakura? I am. Let's find something else what can bloom. We will do that by asking DepFinder the following query: @~桜が→咲く*. The meaning of a new symbol -- tilde (~) is to find sentences that contain words that have the same part of speech as the word prefixed by a tilde. In the query 桜 is a noun, so the query becomes "find a noun with が that has 咲く in any form as a parent" if described in English.
Of course, this query works with other parts of speech as well: @〜綺麗な→家, @〜強く→吹く*, @〜ゆっくりの→俺. Because DepFinder keeps grammatical form of queries, part of speech queries can be useful for searching a word with some grammar.
Use ~A to find sentences that contain words of the same part of speech as A.
Compound Query
Queries described above are primitive. They can be combined to search for even more complex things. By separating two queries with a comma you get a single compound query: @聞く、@動物. In general search systems like Google spaces are used as word separators, but DepFinder uses comma in this meaning. Additionally, all spaces in a query are ignored.
Compound query searches for at least one of its parts, however it prefers to match as much parts as possible.
Query Part Modifiers
There are three query part modifiers: @, + and -. The first one was explained earlier. Other two have their usual meaning in search systems.
Plus
Plus modifier (+) makes the search engine to always match a query part marked by the plus. Let's compare the number of hits of two queries: 聞く、動物 and +聞く、+動物. The first one essentially searches for either 聞く or 動物 in a sentence, however the second one searches only for both at the same time. This explains the difference in the number of hits.
Minus
Minus modifier (-) makes the search engine to find sentences that does not match a query part marked by the minus. For example, let's find an action of sakura except blooming: @桜が→~咲く,-咲く.
Contact Information:
E-mail: arseny <:an email sign:> nlp.ist.i.kyoto-u.ac.jp
Twitter (Mostly in Russian): @eiennohito
Twitter (Mostly in Japanese): @to_aruchan
Details
TODO: Write more clearly.
Priority/Precedence
The query operations have the following precedence or order of resolution:
- grammatical query
- part of speech query
- dependency query
- query part modifiers
- compound query separators
Query part scope
Every query part should be a bunsetsu. Basically, it has usually one content word with all attached grammatical words.
Examples of bunsetsu separation of sentences:
- 毎朝|日が|登る - particles are attached to the content words
- 登りたくなってきたのに - that's a single bunsetsu, yes
- 京都大学は|大きい - compound nouns (京都大学) are treated as a single bunsetsu
Queries like 桜が咲く will not work because they contain two bunsetsu, you need to separate them either to compound or dependency query.