Skip to content

spanish tn code switch#423

Open
folivoramanh wants to merge 2 commits into
NVIDIA:mainfrom
folivoramanh:es_tn_v1
Open

spanish tn code switch#423
folivoramanh wants to merge 2 commits into
NVIDIA:mainfrom
folivoramanh:es_tn_v1

Conversation

@folivoramanh
Copy link
Copy Markdown
Collaborator

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

folivoramanh and others added 2 commits May 18, 2026 10:41
Signed-off-by: Mai Anh <palasek182@gmail.com>
@@ -0,0 +1,3 @@
Apt. Apartamento
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this guy twice?

@@ -0,0 +1,3 @@
Apt. Apartamento
Apt. Apartamento
Apt Apartamento
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add Dept, Dept., Depto and Depto. mapping to Departamento

@@ -0,0 +1,10 @@
0 cero
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to reduce duplication, can we instead use the tsv files in es/data/numbers? digits and zero would do. you would have to transform un to uno in this case

180 psi~ciento ochenta p s i
2 + 2 - 1 = 3~dos más dos menos uno es igual a tres No newline at end of file
2 + 2 - 1 = 3~dos más dos menos uno es igual a tres
Mi dirección es 1234 Maple St., Springfield, IL 62704~Mi dirección es mil doscientos treinta y cuatro Maple Street, Springfield, Illinois seis dos siete cero cuatro
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make these a separate test file as structured for English addresses

out = fst
out = out @ pynini.cdrewrite(pynini.cross("veintiún", "veintiuno"), "", "", NEMO_SIGMA)
out = out @ pynini.cdrewrite(pynini.cross("treintún", "treinta y uno"), "", "", NEMO_SIGMA)
out = out @ pynini.cdrewrite(pynini.cross(" y ún", " y uno"), "", "", NEMO_SIGMA)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also add out = out @ pynini.cdrewrite(pynini.cross(" y un", " y uno"), "", "", NEMO_SIGMA)

+ pynutil.insert("\" } preserve_order: true")
)

address_us_es_inner = AddressUSSurfaceFst(cardinal, deterministic=deterministic).graph
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really nice way to integrate it

)

zip_five = (
graph_zip_digit
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pynini.closure(graph_zip_digit + insert_space, 4) + graph_zip_digit?

pynini.accep(",") + pynini.accep(NEMO_SPACE) + pynini.invert(pynini.string_map(states)), 0, 1
)

zip_code = pynini.compose(NEMO_DIGIT**5, zip_five)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this or is zip_five enough?

2 + 2 - 1 = 3~dos más dos menos uno es igual a tres
Mi dirección es 1234 Maple St., Springfield, IL 62704~Mi dirección es mil doscientos treinta y cuatro Maple Street, Springfield, Illinois seis dos siete cero cuatro
La oficina está ubicada en 567 Main St., Ste. 200, Dallas, TX 75201~La oficina está ubicada en quinientos sesenta y siete Main Street, Suite doscientos, Dallas, Texas siete cinco dos cero uno
Por favor envía el paquete a 890 Oak Ave., Apt. 5B, Brooklyn, NY 11201~Por favor envía el paquete a ochocientos noventa Oak Avenue, Apartamento 5B, Brooklyn, New York uno uno dos cero uno
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here propagating to code. normalized form should be Por favor envía el paquete a ochocientos noventa Oak Avenue, Apartamento cinco B, Brooklyn, New York uno uno dos cero uno


comma_sp = pynini.accep(",") + pynini.accep(NEMO_SPACE)
suite = graph_suite_designator + pynini.closure(NEMO_SPACE, 0, 1) + suite_num
apt = graph_apt_designator + pynini.closure(NEMO_DIGIT | NEMO_UPPER, 1, 4)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants