Limit search to available items
Book Cover
Author Wołk, Krzysztof, author

Title Machine learning in translation corpora processing / Krzysztof Wolk
Published Milton : Chapman and Hall/CRC, 2019


Description 1 online resource (281 pages)
Contents Cover; Title Page; Copyright Page; Acknowledgements; Preface; Table of Contents; Abbreviations and Definitions; Overview; 1: Introduction; 1.1 Background and context; 1.1.1 The concept of cohesion; 1.2 Machine translation (MT); 1.2.1 History of statistical machine translation (SMT); 1.2.2 Statistical machine translation approach; 1.2.3 SMT applications and research trends; 2: Statistical Machine Translation and Comparable Corpora; 2.1 Overview of SMT; 2.2 Textual components and corpora; 2.2.1 Words; 2.2.2 Sentences; 2.2.3 Corpora; 2.3 Moses tool environment for SMT; 2.3.1 Tuning for quality
2.3.2 Operation sequence model (OSM)2.3.3 Minimum error rate training tool; 2.4 Aspects of SMT processing; 2.4.1 Tokenization; 2.4.2 Compounding; 2.4.3 Language models; Out of vocabulary words; N-gram smoothing methods; 2.4.4 Translation models; Noisy channel model; IBM models; Phrase-based models; 2.4.5 Lexicalized reordering; Word alignment; 2.4.6 Domain text adaptation; Interpolation; Adaptation of parallel corpora; 2.5 Evaluation of SMT quality; 2.5.1 Current evaluation metrics; BLEU overview Other SMT metrics2.5.1.3 HMEANT metric; Evaluation using HMEANT; HMEANT calculation; 2.5.2 Statistical significance test; 3: State of the Art; 3.1 Current methods and results in spoken language translation; 3.2 Recent methods in comparable corpora exploration; 3.2.1 Native Yalign method; 3.2.2 A* algorithm for alignment; 3.2.3 Needleman-Wunsch algorithm; 3.2.4 Other alignment methods; 4: Author's Solutions to PL-EN Corpora Processing Problems; 4.1 Parallel data mining improvements; 4.2 Multi-threaded, tuned and GPU-accelerated Yalign
4.2.1 Needleman-Wunsch algorithm with GPU optimization4.2.2 Comparison of alignment methods; 4.3 Tuning of Yalign method; 4.4 Minor improvements in mining for Wikipedia exploration; 4.5 Parallel data mining using other methods; 4.5.1 The pipeline of tools; 4.5.2 Analogy-based method; 4.6 SMT metric enhancements; 4.6.1 Enhancements to the BLEU metric; 4.6.2 Evaluation using enhanced BLEU metric; 4.7 Alignment and filtering of corpora; 4.7.1 Corpora used for alignment experiments; 4.7.2 Filtering and alignment algorithm; 4.7.3 Filtering results; 4.7.4 Alignment evaluation results
4.8 Baseline system training4.9 Description of experiments; 4.9.1 Text alignment processing; 4.9.2 Machine translation experiments; TED lectures translation; Word stems and SVO word order; Lemmatization; Translation and translation parameter adaptation experiments; Subtitles and EuroParl translation; Medical texts translation; Pruning experiments; 4.9.3 Evaluation of obtained comparable corpora; Native Yalign method; Improved Yalign method; Parallel data mining using tool pipeline
Summary This book reviews ways to improve statistical machine speech translation between Polish and English. Research has been conducted mostly on dictionary-based, rule-based, and syntax-based, machine translation techniques. Most popular methodologies and tools are not well-suited for the Polish language and therefore require adaptation, and language resources are lacking in parallel and monolingual data. The main objective of this volume to develop an automatic and robust Polish-to-English translation system to meet specific translation requirements and to develop bilingual textual resources by mining comparable corpora
Notes Analogy-based method
Bibliography Includes bibliographical references and index
Notes Krzysztof Wolk holds a PhD Eng. degree in Computer Science, and is a graduate of the Polish-Japanese Academy of Information Technology. He is currently an associate professor at the Cathedral of Multimedia at the same university. His research is mostly related to natural language processing and machine learning based on statistical methods, neural networks and deep learning; and is interested in IT and its challenges, and engages in interdisciplinary projects, particularly those related to HCI, UX, medicine and psychology. In addition, he has worked as a lecturer at the Warsaw School of Photography & Graphic Design, and as an IT trainer. His specialties as a teacher are primarily deep learning, machine learning, natural language processing, computational linguistics, multimedia, HCI, UX, mobile applications, HTML 5, Adobe applications and server products from Apple and Microsoft. As far as his didactic work is concerned, he leads classrooms at the faculty of computer science and at the new media art department at the Polish-Japanese Academy of Information Technology and he also used to lead classes and lectures at the Warsaw School of Photography & Graphic Design
Print version record
Subject Polish language -- Machine translating
English language -- Machine translating.
Machine translating.
COMPUTERS -- General.
COMPUTERS -- Machine Theory.
MATHEMATICS -- Arithmetic.
English language -- Machine translating
Machine translating
Form Electronic book
ISBN 9780429590771