spanish tn code switch#423
Conversation
Signed-off-by: Mai Anh <palasek182@gmail.com>
for more information, see https://pre-commit.ci
| @@ -0,0 +1,3 @@ | |||
| Apt. Apartamento | |||
There was a problem hiding this comment.
do we need this guy twice?
| @@ -0,0 +1,3 @@ | |||
| Apt. Apartamento | |||
| Apt. Apartamento | |||
| Apt Apartamento | |||
There was a problem hiding this comment.
let's also add Dept, Dept., Depto and Depto. mapping to Departamento
| @@ -0,0 +1,10 @@ | |||
| 0 cero | |||
There was a problem hiding this comment.
to reduce duplication, can we instead use the tsv files in es/data/numbers? digits and zero would do. you would have to transform un to uno in this case
| 180 psi~ciento ochenta p s i | ||
| 2 + 2 - 1 = 3~dos más dos menos uno es igual a tres No newline at end of file | ||
| 2 + 2 - 1 = 3~dos más dos menos uno es igual a tres | ||
| Mi dirección es 1234 Maple St., Springfield, IL 62704~Mi dirección es mil doscientos treinta y cuatro Maple Street, Springfield, Illinois seis dos siete cero cuatro |
There was a problem hiding this comment.
let's make these a separate test file as structured for English addresses
| out = fst | ||
| out = out @ pynini.cdrewrite(pynini.cross("veintiún", "veintiuno"), "", "", NEMO_SIGMA) | ||
| out = out @ pynini.cdrewrite(pynini.cross("treintún", "treinta y uno"), "", "", NEMO_SIGMA) | ||
| out = out @ pynini.cdrewrite(pynini.cross(" y ún", " y uno"), "", "", NEMO_SIGMA) |
There was a problem hiding this comment.
let's also add out = out @ pynini.cdrewrite(pynini.cross(" y un", " y uno"), "", "", NEMO_SIGMA)
| + pynutil.insert("\" } preserve_order: true") | ||
| ) | ||
|
|
||
| address_us_es_inner = AddressUSSurfaceFst(cardinal, deterministic=deterministic).graph |
There was a problem hiding this comment.
this is a really nice way to integrate it
| ) | ||
|
|
||
| zip_five = ( | ||
| graph_zip_digit |
There was a problem hiding this comment.
pynini.closure(graph_zip_digit + insert_space, 4) + graph_zip_digit?
| pynini.accep(",") + pynini.accep(NEMO_SPACE) + pynini.invert(pynini.string_map(states)), 0, 1 | ||
| ) | ||
|
|
||
| zip_code = pynini.compose(NEMO_DIGIT**5, zip_five) |
There was a problem hiding this comment.
do we need this or is zip_five enough?
| 2 + 2 - 1 = 3~dos más dos menos uno es igual a tres | ||
| Mi dirección es 1234 Maple St., Springfield, IL 62704~Mi dirección es mil doscientos treinta y cuatro Maple Street, Springfield, Illinois seis dos siete cero cuatro | ||
| La oficina está ubicada en 567 Main St., Ste. 200, Dallas, TX 75201~La oficina está ubicada en quinientos sesenta y siete Main Street, Suite doscientos, Dallas, Texas siete cinco dos cero uno | ||
| Por favor envía el paquete a 890 Oak Ave., Apt. 5B, Brooklyn, NY 11201~Por favor envía el paquete a ochocientos noventa Oak Avenue, Apartamento 5B, Brooklyn, New York uno uno dos cero uno |
There was a problem hiding this comment.
typo here propagating to code. normalized form should be Por favor envía el paquete a ochocientos noventa Oak Avenue, Apartamento cinco B, Brooklyn, New York uno uno dos cero uno
|
|
||
| comma_sp = pynini.accep(",") + pynini.accep(NEMO_SPACE) | ||
| suite = graph_suite_designator + pynini.closure(NEMO_SPACE, 0, 1) + suite_num | ||
| apt = graph_apt_designator + pynini.closure(NEMO_DIGIT | NEMO_UPPER, 1, 4) |
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.