Natural language watermarking via morphosyntactic alterations


Creative Commons License

Meral H. M. , Sankur B., Oezsoy A. S. , Guengoer T., Sevinc E.

COMPUTER SPEECH AND LANGUAGE, vol.23, pp.107-125, 2009 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 23
  • Publication Date: 2009
  • Doi Number: 10.1016/j.csl.2008.04.001
  • Title of Journal : COMPUTER SPEECH AND LANGUAGE
  • Page Numbers: pp.107-125

Abstract

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates on the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion of void watermark. The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based on human judgments. It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of a language, like Turkish, is an extra bonus.

We develop a morphosyntax-based natural language watermarking scheme. In this scheme, a text is first transformed into a syntactic tree diagram where the hierarchies and the functional dependencies are made explicit. The watermarking software then operates oil the sentences in syntax tree format and executes binary changes under control of Wordnet and Dictionary to avoid semantic drops. A certain level of security is provided via key-controlled randomization of morphosyntactic tools and the insertion or void watermark, The security aspects and payload aspects are evaluated statistically while the imperceptibility is measured using edit-hit counts based oil human judgments, It is observed that agglutinative languages are somewhat more amenable to morphosyntax-based natural language watermarking and the free word order property of it language, like Turkish, is an extra bonus. (C) 2009 Elsevier Ltd. All rights reserved.