Natural language watermarking via morphosyntactic alterations


Creative Commons License

Meral H. M., Sankur B., Oezsoy A. S., Guengoer T., Sevinc E.

COMPUTER SPEECH AND LANGUAGE, cilt.23, ss.107-125, 2009 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 23
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1016/j.csl.2008.04.001
  • Dergi Adı: COMPUTER SPEECH AND LANGUAGE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.107-125
  • Anahtar Kelimeler: Natural language watermarking, Tree bank, Agglutinative, Morphosyntax, Text payload, INFORMATION
  • Yıldız Teknik Üniversitesi Adresli: Hayır

Özet

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates oil the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion or void watermark, The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based oil human judgments, It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of it language, like Turkish, is an extra bonus. (C) 2009 Elsevier Ltd. All rights reserved.

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion of void watermark. The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based on human judgments. It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.