A Solution to the Problem of the Maximal Number of Symbols for Biomolecular Computer

The authors present a solution to the problem of generating the maximum possible number of symbols for a biomolecular computer using restriction enzyme BbvI and ligase as the hardware, and transition molecules built of double-stranded DNA as the software. The presented solution offers an answer to the open question, in the algorithm form, of the maximal number of symbols for a biomolecular computer that makes use of the restriction enzyme BbvI.


Introduction
The beginnings of research into possibilities of applying biomolecules to control biological systems, and also to construct computers, are to be found in theoretical works of the 1960s (Feynman 1961). Then, in the 1980s, Charles Bennett (1982, Bennett andLandauer 1985) pointed to potential possibilities of application of biomolecules to construct energy-efficient nanodevices. However, the world had to wait to see the first practical experiments realizing simple calculations with the use of biochemical reactions until the mid-1990s, when Leonard Adleman (1994) solved the problem of Hamilton's path in graph, using exclusively a biomolecule for this purpose. Successive research revealed the possibility of spontaneous formation of multidimensional structures built from biomolecules, which were made with the use of the conception of selfassembly (Whitesides et al. 1991, Seeman 2001, Gopinath et al. 2016). The multidimensional DNA structures made it possible to realize fractals, e.g., ones of Sierpiński triangle type (Rothemund 2004), which revealed a great potential in calculations based on self-assembly. In 2006, Paul Rothemund (2006) made use of self-assembling DNA molecules to obtain different multidimensional biomolecular structures. Properly prepared DNA molecules also made it possible to carry out a theoretical simulation of Turing machine (Rothemund 1995). Prior to this, in 2001 (Benenson et al. 2001) a practically acting non-deterministic finite automaton based on such DNA molecules, restriction enzyme F okI and DNA ligase was presented. In successive research, it was proved experimentally that such an automaton can work without the use of ligase enzyme (Be-nenson et al. 2003, Chen et al. 2007) and its complexities were extended in practical experiments, ones understood as the number of states using numerous restriction enzymes . It is worth adding that it was with success that laboratory experiments were carried out, in which this biomolecular system was applied to medical diagnosis and treatment (Benenson et al. 2004) and also to simple logical inference (Ran 2009). In another work which dealt with possibilities of applying DNA molecules, a challenge was taken up to not only increase the number of states of such an automaton (Unold et al. 2004), but also that of symbols possible for an automaton built from DNA (Soreni et al. 2005). Moreover, presented the notion of biomolecular automaton, informally characterized in the papers of Rothemund (1995), Benenson et al. (2001), Soreni et al (2005), was presented in a formal way (as a mathematical model called a tailor automaton in a new theory of tailor automata) in the paper Waldmajer et al. (2019).
In the above-mentioned work, Soreni and co-workers (Soreni et al. 2005) put forward a 3-state 3-symbol biomolecular automaton which used the restriction enzyme BbvI as well as considered the problem of determining the maximal number of symbols for the constructed biomolecular automaton. On the basis of the conducted assessment they pointed out that it is possible to construct 40 symbols, each of which is composed of 6 pairs of nucleotides. However, in their work, they pointed to merely 37 such symbols, including one which was erroneously determined. Consequently, they opened the following issue (p.  (2) proposing the idea of working of an algorithm that enables to generate 40 symbols for a biomolecular automaton using the restriction enzyme BbvI, and (3) formulating two general problems in the sphere of generating symbols for biomolecular automata which use one restriction enzyme (among which a biomolecular automaton using the restriction enzyme BbvI is a particular case) and more than one restriction enzyme.
The second section presents the idea of constructing and working of a 3-state 3-symbol biomolecular automaton using the restriction enzyme BbvI as presented by Soreni and co-workers in their work (Soreni et al. 2005). In the third section the conception of working of an algorithm generating the maximal number of symbols for a biomolecular automaton using the restriction enzyme BbvI was presented together with a discussion of various undesired situations which may occur in the course of working of a biomolecular automaton that makes use of one restriction enzyme (in particular for the restriction enzyme BbvI). In the last section, there were formulated two general problems of generating the maximal number of symbols for a certain class of biomolecular automata using one or more than one restriction enzymes.

Biomolecular finite automaton and the idea of its actions
In this section, we make a presentation of the 3-state 3symbol biomolecular finite DNA automaton (see Fig. 1), which was presented by Soreni and co-workers (Soreni et al. 2005). The automaton uses the restriction enzyme BbvI, ligase enzyme and DNA double-stranded fragments (input molecule, set of transition molecules and set of detection molecules). The double-stranded DNA fragments include the adenine, cytosine, guanine, and thymine bases marked as A, C, G and T, respectively.
- The task of the BbvI restriction enzyme is to cut the double-stranded DNA after recognizing a specific sequence (see Fig. 2A) in the double-stranded DNA.
The BbvI restriction enzyme will cut the doublestranded DNA after the 8th nucleotide in the DNA strand in the 5 -3 direction and after the 12th nucleotide in the DNA strand in the 3 -5 direction from the recognized specific sequence (see Fig. 2B). The task of the ligase enzyme is to ligate the two doublestranded DNAs having complementary sticky ends (see Fig. 4A and 4B), where a sticky end is a single-stranded DNA at the end of a double-stranded DNA. In the given sense, the sticky end 'TTTA' of a single-stranded DNA (see Fig. 4A) is complementary to a sticky end 'AAAT' of the other double-stranded DNA (see Fig. 4B). The result of their ligation is one double-stranded DNA (see Fig. 4C).
Both the restriction enzyme BbvI and the ligase enzyme play the key role in the action of a biomolecular automaton, determining, respectively: the operation of cutting of a fragment of the double-stranded DNA and the operation of ligating of two fragments of double-stranded DNAs.
The input molecule (see Fig. 3) is a double-stranded DNA fragment in which it is possible to distinguish the following three basic parts: the input word x consisting of the symbols a, b and c (x = acb), the terminal symbol and the base sequence. At the both ends of the input molecule there occur additional base pairs and their occurrence is determined by the properties related to the action of the restriction enzyme.
To construct an input word of the 3-state 3-symbol deterministic finite automaton, the following three symbols: a, b and c (see Fig. 5) were used. These symbols were coded by means of six base pairs. Besides the aforementioned symbols, the additional terminal symbol t was introduced. This symbol is coded by means of the same number of base pairs as the symbols a, b and c. This symbol was used to acquire an output molecule which is used to determine whether the automaton has finished acting in the required state and has accepted the input word x.
The base sequence consists of a certain number of base pairs, contains a specific sequence recognizable by the BbvI restriction enzyme, and makes it possible to define the start state by determining the cut place of the input molecule by the BbvI restriction enzyme (cf. Fig. 3 and Fig. 2B). Let us note that the term "base sequence" did not appear in work Soreni et al. (2005). Introducing this term is meant to clearly determine the manner of setting the start state of a biomolecular automaton. According to the idea contained in the work of Soreni and co-workers Figure 3: Input molecule containing the input word x = acb; Abp -Additional base pairs. Table 1: Connection of the states s 0 , s 1 and s 2 of a biomolecular automaton with the permanent cut places of the symbols a, b, c and t of the biomolecular automaton. Figure 5: Symbols a, b, c and the terminal symbol t. (Soreni et al. 2005), the reading of a symbol in a certain state of the automaton is identified with the cutting of the double-stranded DNA by the BbvI restriction enzyme in the area of a symbol, in a determined (permanent) place of the DNA strand, in the 5 -3 direction and in a determined (permanent) place of the DNA strand in the 3 -5 direction. Tab. 1 presents a connection between the states and two permanent cut places of the symbols.

-G C AG C T T AAA T C T GG C T T G C GA T GAG T GA T G T C G C -3 3 -C G T C GAA T T T AGA C C GAA C G C T A C T C A C T A C AG C G -5
In accordance with the input molecule presented on Fig.  3, the first cutting input molecule with the use of the restriction enzyme BbvI will follow in the area of the symbol a, which corresponds to the state s 2 . In this way, in the state s 2 , the symbol a was read and a fragment of DNA was obtained as presented on Fig. 6. In this sense, the state s 2 is a initial state. Adding to the base sequence of one or two pairs of nucleotides can set the start state to be the following: s 1 or s 0 , respectively. The set of transition molecules is used to implement a set of transitions in the 3-state 3-symbol deterministic finite automaton. We obtain transition from one state to the other (the same or another state), upon reading a symbol, through ligating with the use of ligase enzyme, of a DNA fragment obtained on Fig. 6 with one of the transition molecules. Each transition molecule contains a specific sequence recognizable by the BbvI restriction enzyme and the additional base pairs. Exemplary transition molecules are presented in Fig For each state, one detection molecule is constructed and thus a set of detection molecules is specified (see Fig. 8). It should be noted that the detection molecules have different numbers of additional base pairs, which makes it possible to determine laboratorily the state in which the automaton finished its action.
The beginning of the work: the BbvI, ligase enzyme, many copies of the transition molecules and many copies Figure 7: Selected transition molecules used in the transition function: of the detection molecules are placed in a laboratory tube; the final addition is many copies of the input molecule. After these elements have been mixed in the test tube, the biomolecular automaton starts its action. In successive steps there follows reading of the symbol a in the state s 2 (see Fig. 9a), making use of the transition molecule shown in Fig. 7C to transition from the state s 2 to that of s 0 after reading the symbol a (see Fig. 9b), reading of the symbol c in the state s 0 (see Fig. 9c), using the transition molecule presented in Fig. 7A to transition from the state s 0 to the state s 1 after reading the symbol c (see Fig.  9d) reading the symbol b in the state s 1 (see Fig. 9e), using the transition molecule presented in Fig. 7B and reading the terminal symbol t in the state s 1 (see. Fig. 9f-g). In the last step there follows ligation of a fragment of doublestranded DNA presented in Fig. 9g with one of the detection molecules (see Fig. 8B). As a result of ligation of these DNA fragments an output molecule is formed (see Fig. 9h), which -from the laboratory point of view -serves to determine the end state of a biomolecular finite automaton.
3 Algorithm for the problem of the maximal number of symbols 3.1 The formal aparatus used in the description of the algorithm Let the set ∆ = {A, C, G, T} and the function σ, which is bijection of the set ∆ on ∆, which is defined in the following way: σ(A) = T, σ(T) = A, σ(C) = G and σ(G) = C be given. The set ∆ is called a set of nucleotides, the elements of the set ∆ are called nucleotides, and the function σ is called complementarity of nucleotides.
We call any finite sequence of nucleotides of the set ∆ as a word. The word x which is the sequence X 1 , X 2 , . . . , X j of nucleotides of the set ∆ (X i ∈ ∆, 0 < i ≤ j ∈ N ) is written as follows x = X 1 X 2 . . . X j . The number of the elements of the sequence x is called the length of the word x (denoted symbolically: |x|), while the i-th nucleotide of the word x (the i-th element of the word x) as x(i). The set of all the words formed from the nucleotides of the set ∆, whose length is greater than zero, is denoted as ∆ + .
Let N ). We call the word X j . . . X 2 X 1 an opposite word (we denote symbolically: x −1 ) to the word x. We call the word xy = . . Y j a concatenation xy of two words x and y such that . We say that the word x is included in the word y, beginning with the k-th (1 ≤ k ∈ N ) position (we denote symbolically: x ⊆ k y), if k + |x| ≤ |y| + 1 and ∃u, v ∈ ∆ * (y = uxv ∧ |u| = k − 1). The word x is a sub-word of y (we denote symbolically: x ⊆ y) when the word x is included in the word y, beginning with a certain position k, i.e., x ⊆ y ⇔ ∃k(x ⊆ k y). The word x is a prefix of the word y, when x ⊆ 1 y. The word x is a suffix of the word y, when x −1 is the prefix of the word y −1 .
The introduced notion of complementarity of nucleotides and the introduced denotations make it possible to define the function which will be called complementariness of words. The mapping Ξ : ∆ + → ∆ + defined in the following way: Ξ(x) = y, where |y| = |x| and y(i) = σ(x(i)) for each i ∈ {1, . . . , |y|} and, for x ∈ ∆ + is called complementarity of words. Let The words x and y are synthesable over the length 3, when there exists the word u ∈ ∆ + of the length 3 being the suffix of the word x and the prefix of the word y. The concatenation of the synthesable words x and y over the length 3 is the

Description of the algorithm
The idea of the algorithm of generating the maximal number of symbols for a biomolecular automaton using the restriction enzyme BbvI will be characterized through four stages, which are distinguished in the algorithm: the initial stage, the stage of deployment and verification, the stage of generation and the final stage. At each of the indicated stages we make use only of strands of symbols in the direction 5 -3 , since having strands of symbols in the direction 5 -3 , we can -by means of the principle of complementarity of nucleotides -obtain strands of symbols in the direction 3 -5 .
Let the set A 0 of all 4-element sequences of nucleotides be given:

-GG C T T G C GA T GAG T GA T G T C G C -3 3 -A C G C T A C T C A C T A C AG C G -5
? (b) ? 6 34 bp 21 bp

-G C AG C T A T T GG C T T G C GA T GAG T GA T G T C G C -3 3 -C G T C GA T AA C C GAA C G C T A C T C A C T A C AG C G -5
? TCGA, TGCA, TATA from the set A 0 of all 4-element sequences of nucleotides. The appearance of the indicated sixteen words (4element sequences of nucleotides) causes a biomolecular automaton to malfunction due to the possibility of ligation of a transition molecule with itself -each of the transition molecules exists in multi copies.
Let the transition molecule be given, in which we use the sticky end: CATG (see Fig. 10A). Let us note that this molecule occurs in many copies. Thus, as a result of action of the biomolecular automaton and ligation of one copy of the transition molecule T N S1 (cf. Fig. 10A) with another copy of the same transition molecule there forms the double-stranded fragment of DNA presented in Fig. 10B. In consequence, this causes the number of copies of the molecule T N S1 , to be limited, which can be made use of in further computations carried out by the biomolecular automaton.
So as to prevent the possibility of ligation of copies of the same transition molecule, it is necessary to remove from the set A 0 the words which satisfy the following condition: In this way, we reject sixteen words, given earlier, from the set A 0 and as in consequence we obtain the set: {x : where the number of the elements of the set A 1 amounts to 240. Availing ourselves of the elements of the set A 1 , we form the maximal set A 2 of pairs of the elements in the following manner: where the number of the elements of the set A 2 amounts to 240. Then, using the set A 2 , we form the set A 3 of pairs in the following way: the set A 3 is the set A 2 , from which we removing certain pairs according to the principle of (P). The principle of (P): if the pairs (x, y) and (y, x) belong to the set A 2 , then we will remove from the set A 2 a pair whose first element of the pair, comparing the both first el-  Let us consider pair (x, y)=(AAAA, TTTT) from Tab. 2 (No 1) and two transition molecules in the biomolecular automaton with sticky ends: AAAA and TTTT (see Fig.  11A and Fig. 11B). As a result of ligation of these transition molecules is formed the double-stranded fragment of DNA presented in Fig. 11C. As a consequence, this causes the number of the copies of the molecules T N S2 and T N S3 , to be limited, which may be used in further calculations done by the biomolecular automaton. In connection with this, in the algorithm of generating the maximal number of symbols for a biomolecular automaton using the restriction enzyme BbvI only one element of each of the given 120 pairs of the set A 3 should be used. Selecting individual elements of the successive pairs in this manner, we obtain the family P(A 1 ) of maximal sets B ⊂ A 1 , such that for each

A A A A T T T T 2A C C C G G G T 4A G T C G A C T 2 A A A C G T T T 22 A C C G C G G T 42 A G T G C A C T 3 A A A G C T T T 23 A C C T A G G T 43 A T A A T T A T 4 A A A T A T T T 24 A C G A T C G T 44 A T A C G T A T 5 A A C A T G T T 25 A C G C G C G T 45 A T A G C T A T 6 A A C C G G T T 26 A C G G C C G T 46 A T C A T G A T 7 A A C G C G T T 27 A C T A T A G T 47 A T C C G G A T 8 A A C T A G T T 28 A C T C G A G T 48 A T C G C G A T 9 A A G A T C T T 29 A C T G C A G T 49 A T G A T C A T 10 A A G C G C T T 30 A G A A T T C T 50 A T G C G C A T 1A A G G C C T T 3A G A C G T C T 5A T G G C C A T 12 A A G T A C T T 32 A G A G C T C T 52 A T T A T A A T 13 A A T A T A T T 33 A G A T A T C T 53 A T T C G A A T 14 A A T C G A T T 34 A G C A T G C T 54 A T T G C A A T 15 A A T G C A T T 35 A G C C G G C T 55 C A A A T T T G 16 A C A A T T G T 36 A G C G C G C T 56 C A A C G T T G 17 A C A C G T G T 37 A G G A T C C T 57 C A A G C T T G 18 A C A G C T G T 38 A G G C G C C T 58 C A C A T G T G 19 A C A T A T G T 39 A G G G C C C T 59 C A C C G G T G 20 A C C A T G G T 40 A G T A T A C T 60 C A C G C G T
x, y ∈ B the condition holds, (**) x −1 = Ξ(y).
The indicated condition (**) prevents the formation of transition molecules which could ligate with one another during the action of the biomolecular automaton.
In the next part of the algorithm, we will select elements of the family P(A 1 ) as sets meant to serve to check the possibilities of generating 40 symbols for the biomolecular automaton using the restriction enzyme BbvI. Thus, let the set C be a chosen element of the family P(A 1 ).
In the first part of the stage of deployment and verification, we select a single assignment of 120 words being the elements of the set C, to three sets G 1 , G 2 and G 3 (40 words to each set) from among successive possible combinations of assigning the 120 words of the set C to 3 sets consisting of 40 each.
In the second part of the stage of deployment and verification we pre-check whether we are able to form 40 words of length 6 (whether we can create 40 strands in the direction 5 -3 ). We examine this by comparing the elements of: first, the sets G 1 , G 2 and then G 2 , G 3 in the following way: 1. for the sets G 1 and G 2 we check whether the number of occurrences of each word x of length 3, being a suffix in the words of the set G 1 , is identical with the  2. for the sets G 2 and G 3 we check whether the number of occurrences of each word x of length 3 being a suffix in the words of the set G 2 is identical with the number of occurrences of the word x as a prefix in the words of the set G 3 .

C G C A T G C G 99 G A T A T A T C 119 T C C A T G G A 80 C G C C G G C G 100 G C A A T T G C 120 T G A A T T C
At the stage of generating we introduce the auxiliary set D = ∅ and examine the possibility of forming 40 words of length 6 (40 strands of symbols in the direction 5 -3 ) making use of the elements of the sets G 1 , G 2 and G 3 , as well as synthesizable concatenations of words of length 3. Each word of length 6 is obtained through a double use of synthesizable concatenations of two words of length 3: 1. we select one word from each of the three sets G 1 , G 2 and G 3 in such a way as to make possible concatenation of synthesizable words x ∈ G 1 , y ∈ G 2 of length 3 and also to enable concatenation of synthesizable words y ∈ G 2 , z ∈ G 3 of length 3.
2. having selected the words x ∈ G 1 , y ∈ G 2 , z ∈ G 3 which satisfy the above-mentioned condition, we form a word u of length 6 (a strand of symbol in the direction 5 -3 ): u = [[x, y] 3 , z] 3 (symbol assembling, see Fig. 12), 3. the word u obtained upon satisfying the abovepresented condition is added to the set D.
In the case where it is impossible to form 40 words (40 words u) from the elements of the sets G 1 , G 2 and G 3 in the way given above, we return to checking another possibility of assigning the elements of the set C to the sets G 1 , G 2 and G 3 . In the case where all the possible assignments of the elements of the set C to the sets G 1 , G 2 and G 3 , we return to examining the next element of the family P(A 1 ). In the case where all the elements of the family P(A 1 ) have been checked and it is impossible to obtain 40 symbols, the algorithm communicates: "Unable to obtain 40 symbols for the biomolecular automaton using the restriction enzyme BbvI".
If the set D has 40 words determined from the elements of the set G 1 , G 2 and G 3 , we check whether the words of this set (the strands of the symbols in the direction 5 -3 ) avoid each of the four, described below, undesired situations, due to the appearance of the sequence recognized by the restriction enzyme BbvI.
The first undesired situation concerns an inclusion of a sequence recognized by the restriction enzyme BbvI inside any symbol. An example to illustrate the above undesired situation is presented in Fig. 13A. Let us note that the second, analogous, undesired situation can occur if a sequence recognized by the restriction enzyme BbvI is included inside any symbol "reversed by 180 o ". An example of the latter is shown in Fig. 13B. The third undesired situation concerns an inclusion of a sequence recognized by the restriction enzyme BbvI in connection of two symbols. An instance illustrating the above-described situation is presented in Fig. 14A. Let us note that the fourth, analogous, undesired situation can occur if a sequence recognized by the restriction enzyme BbvI is included in the connection of two symbols "reversed by 180 o ". An example to illustrate the above undesired situation is shown in Fig. 14B.
The appearance of any of the four undesired situations can lead to the occurrence of an undesired action of a biomolecular automaton. In connection with these situations, it is necessary to examine, respectively: 4. whether the concatenation z = xy of any words x ∈ D and y ∈ D satisfies the condition: ∼ ((Ξ(e x )) −1 ⊆ z), where ∼ ((Ξ(e x )) −1 ⊆ z) means that a sequence recognized by the restriction enzyme BbvI cannot be included in the ligation of two symbols "reversed by 180 • " and relativized only to one considered strand in the direction 5 -3 .
In the case one of the four undesired situation is detected, we return to checking another possibility of assigning the elements of the set C to the sets G 1 , G 2 and G 3 . When all the possible assignments of the elements of the set C to the sets G 1 , G 2 and G 3 have been checked, we return to examining another element of the family P(A 1 ). In the case all the elements of the family P(A 1 ) have been checked and it has been found that it is impossible to obtain 40 symbols that do not include undesired situations, the algorithm returns the message: "Unable to obtain 40 symbols for a biomolecular automaton using the restriction enzyme BbvI". If the set D has 40 words formed from the elements of the sets G 1 , G 2 , G 3 and there does not occur a single undesired situation, then we move on to the last stage.
At the last stage we determine elements of 40 complementary words for each word of the set D, making use of enzyme in a biomolecular automaton. Thus, to point to the possibilities of generalization of the question one can start pondering over: (1) the possibility of generating the maximal number of symbols (coded by n pairs of nucleotides) for a biomolecular automaton using one restriction enzyme, (2) the possibility of generating the maximal number of symbols (coded by n pairs of nucleotides) for a biomolecular automaton using more than one restriction enzyme, and also (3) the possibility of an algorithmic approach in each of the two indicated cases. In this way it is possible to raise two general problems which are relativized to the number of restriction enzymes used in a biomolecular automaton and require working out relevant algorithms. Problem 1: generate the maximal number of symbols (coded by n pairs of nucleotides) for a biomolecular automaton using one restriction enzyme. Problem 2: generate the maximal number of symbols (coded by n pairs of nucleotides) for a biomolecular automaton using more than one restriction enzyme. The above-posed problems require considering and defining the conditions which must be imposed on the relations between the restriction enzyme, symbols and other elements which are components of a biomolecular automaton. The output conditions which ought to be considered and taken account of in the above-mentioned relation are the conditions included in the works Krasiński et al. (2013) and . Taking into account these conditions will make it possible to determine all the indispensable conditions which serve to elaborate on algorithms enabling to solve the both general problems mentioned above. The solution to the mentioned general problems make it possible to algorithms development for the generating symbols, which are important for laboratory implementation of biomolecular automata.