Pgbigm-hackers まとめ読み, 2 巻, 3 号 (Pgbigm-hackers) - pg_bigm(ピージーバイグラム)

On Sat, Jun 22, 2013 at 12:00 PM,
<pgbig****@lists*****> wrote:
> Pgbigm-hackers
> メーリングリストへの投稿は以下のアドレスに送ってください．
>         pgbig****@lists*****
>
> Webブラウザを使って入退会するには以下のURLにどうぞ．
>         http://lists.sourceforge.jp/mailman/listinfo/pgbigm-hackers
> メールを使う場合，件名(Subject:)または本文に help と書いて以下の
> アドレスに送信してください．
>         pgbig****@lists*****
>
> メーリングリストの管理者への連絡は，以下のアドレスにお願いします.
>         pgbig****@lists*****
>
> 返信する場合，件名を書き直して内容がわかるようにしてください．
> そのままだと，以下のようになってしまいます． "Re: Pgbigm-hackers
> まとめ読み, XX 巻 XX 号"
>
>
> 本日の話題:
>
>    1. Re: Understand pg_bigm.gin_key_limit (fujii****@nttda*****)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 21 Jun 2013 12:01:39 +0900
> From: <fujii****@nttda*****>
> Subject: Re: [Pgbigm-hackers] Understand pg_bigm.gin_key_limit
> To: <pgbig****@lists*****>
> Message-ID:
>         <42C5B****@MBX-M*****>
>
> Content-Type: text/plain; charset="iso-2022-jp"
>
>
>
>> -----Original Message-----
>> From: pgbig****@lists*****
>> [mailto:pgbig****@lists*****] On Behalf Of
>> Amit Langote
>> Sent: Thursday, June 20, 2013 2:32 PM
>> To: pgbig****@lists*****
>> Subject: [Pgbigm-hackers] Understand pg_bigm.gin_key_limit
>>
>> Hello,
>>
>> Can someone explain this parameter?
>>
>> pg_bigm.gin_key_limit:
>>
>> ...
>> ...
>> DefineCustomIntVariable("pg_bigm.gin_key_limit",
>>                                                          "Sets the
>> maximum number of bi-gram keys allowed to "
>>                                                          "use for
>> GIN
>> index search.",
>>                                                          "Zero
>> means no limit.",
>> ...
>> ...
>
> The document explains clearly what this parameter is, but it's written in
> only Japanese... Could you translate it and read it? ;P
> http://pgbigm.sourceforge.jp/pg_bigm.html#parametares
>
> This parameter specifies the number of 2-gram character strings which we use
> for picking up the result from the GIN index. By default, we use all the
> 2-gram character strings generated from the search keyword.
>
> Imagine the case where you execute the following query.
>
>     SELECT * FROM yourtbl WHERE col LIKE '%POSTGRES%';
>
> In this case, the following seven 2-gram character strings are generated by
> pg_bigm. And then, by default, all of them are used to pick up the search
> result from GIN index.
>
>     PO, OS, ST, TG, GR, RE, ES
>
> But one problem of GIN index scan is that the more the number of character
> strings we use, the more the performance overhead of GIN index scan becomes.
> To avoid this problem, gin_key_limit allows a user to limit the number of
> 2-gram character strings which are used for GIN index scan. IOW, by using
> subset of 2-gram character strings instead of using all of them, we might
> be able to improve the search performance.
>

Thank you. I was missing the point that it is number of "search
keyword" bigrams that we are talking about.

Anyway. later in the doc, it is mentioned that using lower value might
result into many false positives and require more processing in
recheck. What method exactly is used in recheck? Is it a non-bigram
method of comparing? How does rescan help in getting rid of such false
positives. I hope "false positives" is the right term.

--
Amit Langote

pg_bigm(ピージーバイグラム)
Fork

[Pgbigm-hackers] Pgbigm-hackers まとめ読み, 2 巻, 3 号

pg_bigm(ピージーバイグラム) Fork

[Pgbigm-hackers] Pgbigm-hackers まとめ読み, 2 巻, 3 号

pg_bigm(ピージーバイグラム)
Fork