source: yaz/trunk/fuentes/doc/yaz-icu-man.xml @ 265

Last change on this file since 265 was 265, checked in by mabarracus, 4 years ago

Add new source code 5.15.2

File size: 7.6 KB
Line 
1<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook V4.4//EN"
2 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"
3[
4     <!ENTITY % local SYSTEM "local.ent">
5     %local;
6     <!ENTITY % entities SYSTEM "entities.ent">
7     %entities;
8     <!ENTITY % idcommon SYSTEM "common/common.ent">
9     %idcommon;
10]>
11<refentry id="yaz-icu">
12 <refentryinfo>
13  <productname>YAZ</productname>
14  <productnumber>&version;</productnumber>
15  <orgname>Index Data</orgname>
16 </refentryinfo>
17
18 <refmeta>
19  <refentrytitle>yaz-icu</refentrytitle>
20  <manvolnum>1</manvolnum>
21  <refmiscinfo class="manual">Commands</refmiscinfo>
22 </refmeta>
23
24 <refnamediv>
25  <refname>yaz-icu</refname>
26  <refpurpose>YAZ ICU utility</refpurpose>
27 </refnamediv>
28
29 <refsynopsisdiv>
30  <cmdsynopsis>
31   <command>yaz-icu</command>
32   <arg>-c <replaceable>config</replaceable></arg>
33   <arg>-p <replaceable>opt</replaceable></arg>
34   <arg>-s</arg>
35   <arg>-x</arg>
36   <arg choice="opt">infile</arg>
37  </cmdsynopsis>
38 </refsynopsisdiv>
39
40 <refsect1><title>DESCRIPTION</title>
41  <para>
42   <command>yaz-icu</command> is a utility which demonstrates
43   the ICU chain module of yaz. (<filename>yaz/icu.h</filename>).
44  </para>
45  <para>
46    The utility can be used in two ways. It may read some text
47    using an XML configuration for configuring ICU and show text analysis.
48    This mode is triggered by option <literal>-c</literal> which specifies
49    the configuration to be used. The input file is read from standard
50    input or from a file if <literal>infile</literal> is specified.
51  </para>
52  <para>
53    The utility may also show ICU information. This is triggered by
54    option <literal>-p</literal>.
55  </para>
56 </refsect1>
57
58 <refsect1><title>OPTIONS</title>
59  <variablelist>
60   <varlistentry>
61    <term>-c <replaceable>config</replaceable></term>
62    <listitem><para>
63      Specifies the file containing ICU chain configuration
64      which is XML based.
65     </para></listitem>
66   </varlistentry>
67
68   <varlistentry>
69    <term>-p <replaceable>type</replaceable></term>
70    <listitem><para>
71      Specifies extra information to be printed about the ICU system.
72      If <replaceable>type</replaceable> is <literal>c</literal>
73      then ICU converters are printed.
74      If <replaceable>type</replaceable> is <literal>l</literal>,
75      then available locales are printed.
76      If <replaceable>type</replaceable> is <literal>t</literal>,
77      then available transliterators are printed.
78     </para></listitem>
79   </varlistentry>
80
81   <varlistentry>
82    <term>-s</term>
83    <listitem><para>
84      Specifies that output should include sort key as well. Note that
85      sort key differs between ICU versions.
86     </para></listitem>
87   </varlistentry>
88
89   <varlistentry>
90    <term>-x</term>
91    <listitem><para>
92      Specifies that output should be XML based rather than
93      "text" based.
94     </para></listitem>
95   </varlistentry>
96
97  </variablelist>
98 </refsect1>
99 <refsect1><title>ICU chain configuration</title>
100  <para>
101   The ICU chain configuration specifies one or more rules to convert
102   text data into tokens. The configuration format is XML based.
103  </para>
104  <para>
105   The toplevel element must be named <literal>icu_chain</literal>.
106   The <literal>icu_chain</literal> element has one required attribute
107   <literal>locale</literal> which specifies the ICU locale to be used
108   in the conversion steps.
109  </para>
110  <para>
111   The <literal>icu_chain</literal> element must include elements where
112   each element specifies a conversion step. The conversion is performed
113   in the order in which the conversion steps are specified.
114   Each conversion element takes one attribute: <literal>rule</literal>
115   which serves as argument to the conversion step.
116  </para>
117  <para>
118   The following conversion elements are available:
119
120   <variablelist>
121    <varlistentry>
122     <term>casemap</term>
123     <listitem><para>
124       Converts case (and rule specifies how):
125
126       <variablelist>
127        <varlistentry>
128         <term>l</term>
129         <listitem>
130          <para>Lower case using ICU function u_strToLower. </para>
131         </listitem>
132        </varlistentry>
133
134        <varlistentry>
135         <term>u</term>
136         <listitem>
137          <para>Upper case using ICU function u_strToUpper.</para>
138         </listitem>
139        </varlistentry>
140
141        <varlistentry>
142         <term>t</term>
143         <listitem>
144          <para>To title using ICU function u_strToTitle.</para>
145         </listitem>
146        </varlistentry>
147
148        <varlistentry>
149         <term>f</term>
150         <listitem>
151          <para>Fold case using ICU function u_strFoldCase.</para>
152         </listitem>
153        </varlistentry>
154
155       </variablelist>
156      </para></listitem>
157    </varlistentry>
158
159    <varlistentry>
160     <term>display</term>
161     <listitem><para>
162       This is a meta step which specifies that a term/token is to
163       be displayed. This term is retrieved in an application
164       using function icu_chain_token_display (<filename>yaz/icu.h</filename>).
165      </para></listitem>
166    </varlistentry>
167
168    <varlistentry>
169     <term>transform</term>
170     <listitem><para>
171       Specifies an ICU transform rule using a transliterator
172       Identifier.
173       The rule attribute is the transliterator Identifier.
174       See  <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
175       more information.
176      </para></listitem>
177    </varlistentry>
178
179    <varlistentry>
180     <term>transliterate</term>
181     <listitem><para>
182       Specifies a rule-based transliterator.
183       The rule attribute is the custom transformation rule to be used.
184       See <ulink url="&url.icu.transform;">ICU Transforms</ulink> for
185       more information.
186      </para></listitem>
187    </varlistentry>
188
189    <varlistentry>
190     <term>tokenize</term>
191     <listitem><para>
192       Breaks / tokenizes a string into components using
193       ICU functions ubrk_open, ubrk_setText, .. . The rule is
194       one of:
195       <variablelist>
196        <varlistentry>
197         <term>l</term>
198         <listitem>
199          <para>Line. ICU: UBRK_LINE.</para>
200         </listitem>
201        </varlistentry>
202
203        <varlistentry>
204         <term>s</term>
205         <listitem>
206          <para>Sentence. ICU: UBRK_SENTENCE.</para>
207         </listitem>
208        </varlistentry>
209
210        <varlistentry>
211         <term>w</term>
212         <listitem>
213          <para>Word. ICU: UBRK_WORD.</para>
214         </listitem>
215        </varlistentry>
216
217        <varlistentry>
218         <term>c</term>
219         <listitem>
220          <para>Character. ICU: UBRK_CHARACTER.</para>
221         </listitem>
222        </varlistentry>
223
224        <varlistentry>
225         <term>t</term>
226         <listitem>
227          <para>Title. ICU: UBRK_TITLE.</para>
228         </listitem>
229        </varlistentry>
230
231       </variablelist>
232      </para></listitem>
233    </varlistentry>
234
235    <varlistentry>
236     <term>join</term>
237     <listitem>
238      <para>
239       Joins tokens into one string. The rule attribute is the joining
240       string, which may be empty. The join conversion element was added
241       in YAZ 4.2.49.
242      </para>
243     </listitem>
244    </varlistentry>
245   </variablelist>
246
247  </para>
248 </refsect1>
249 <refsect1><title>EXAMPLES</title>
250  <para>
251   The following command analyzes text in file <filename>text</filename>
252   using ICU chain configuration <filename>chain.xml</filename>:
253   <screen>
254    cat text | yaz-icu -c chain.xml
255   </screen>
256   The chain.xml might look as follows:
257    <screen><![CDATA[
258<icu_chain locale="en">
259  <transform rule="[:Control:] Any-Remove"/>
260  <tokenize rule="w"/>
261  <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
262  <transliterate rule="xy > z;"/>
263  <display/>
264  <casemap rule="l"/>
265</icu_chain>
266]]>
267   </screen>
268  </para>
269 </refsect1>
270 <refsect1><title>SEE ALSO</title>
271  <para>
272   <citerefentry>
273    <refentrytitle>yaz</refentrytitle>
274    <manvolnum>7</manvolnum>
275   </citerefentry>
276  </para>
277  <para>
278   <ulink url="&url.icu;">ICU Home</ulink>
279  </para>
280  <para>
281   <ulink url="&url.icu.transform;">ICU Transforms</ulink>
282  </para>
283 </refsect1>
284</refentry>
285
286<!-- Keep this comment at the end of the file
287Local variables:
288mode: nxml
289nxml-child-indent: 1
290End:
291-->
Note: See TracBrowser for help on using the repository browser.