Natural language watermarking via morphosyntactic alterations


Creative Commons License

Meral H. M., Sankur B., Oezsoy A. S., Guengoer T., Sevinc E.

COMPUTER SPEECH AND LANGUAGE, vol.23, pp.107-125, 2009 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 23
  • Publication Date: 2009
  • Doi Number: 10.1016/j.csl.2008.04.001
  • Journal Name: COMPUTER SPEECH AND LANGUAGE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.107-125
  • Keywords: Natural language watermarking, Tree bank, Agglutinative, Morphosyntax, Text payload, INFORMATION
  • Yıldız Technical University Affiliated: No

Abstract

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion of void watermark. The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based on human judgments. It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates oil the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion or void watermark, The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based oil human judgments, It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of it language, like Turkish, is an extra bonus. (C) 2009 Elsevier Ltd. All rights reserved.