Monday, December 22, 2014

Verilog Code for a Simple memory model

Module ram6116(io,cs_b,we_b,addr);
Inout [7:0] io;
Input cs_b,we_b;
Input [7:0] addr;
reg [7:0] ram1 [255:0];
always@(we_b,cs_b,addr)
begin
if(cs_b==1)
io=8’bz;
else
begin
if (posedge(we_b))
ram1[addr]=io;
endif
if(we_b==1)
io=ram1[addr];
else
io=8’bz;
endif
end
endif
end
endmodule

Using a One-Hot State Assignment

Using a One-Hot State Assignment
As in PGAs, each logic cell contains two flip-flops, minimization of number of flip-flops is necessary for any design. So that the total number of logic cells used and the interconnections between the cells can be reduced. A one-hot state design can be used to design faster logic i.e., the number of cells required to realize each equation can be reduced.

It uses one flip-flop for each state => a state machine with N states requires N flip-flops. Eg.: a system with four states (T0,T1,T2 and T3) can use four flip-flops (Q0,Q1,Q2 and Q3) with the following state assignment:

T0: Q0Q1Q2Q3=1000, T1:0100, T2:0010, T3:0001
ð  The other 12 combinations are not used
ð  The next state and output equations can be written by inspection of the state graph or by tracing link paths on an SM chart.
Fig.1: Partial State Graph

Consider the partial state graph shown in fig.1, the next state equation for flip-flop Q3 can be written as
Q3+=X1Q0Q1’Q2’Q3’+X2Q0’Q1Q2’Q3’+X3Q0’Q1’Q2Q3’+X4Q0’Q1’Q2’Q3

Since Q0=1 => Q1=Q2=Q3=0 then the term Q1’Q2’Q3’ term is redundant and hence can be neglected. Then using similar case all the primed state variables can be eliminated from the other terms, then the next-state equation reduces to

Q3+=X1Q0+X2Q1+X3Q2+X4Q3
ð  Each term has exactly one state variable.
ð  Similarly each output equation contains exactly one state variable
Z1=X1Q0+X3Q2     Z2=X2Q1+X4Q3

When a one-hot assignment is used, the next-state equation for each flip-flop will contain one term for each arc (or link path) leading into the corresponding state. Hence in general, each term in every next-state equation and in every output equation will contain exactly one state variable. For asynchronous network additionally a “holding term” is required for each next state equation.
Method-1:
For one-hot assignment, resetting the system requires that one flip-flop be set to 1 instead of resetting all flip-flops to 0. If the flip-flops used do not have a preset input then we can replace Q0 with Q0’ throughout. Hence the changes in assignment will be
T0:Q0Q1Q2Q3=0000, T1:1100, T2:1010, T3:1001

And the modified equations are
Q3+=X1Q0’+X2Q1+X3Q2+X4Q3
Z1=X1Q0’+X3Q2
Z2=X2Q1+X4Q3

Method-2:
To solve the reset problem without modifying the on-hot assignment, add an extra term to the equation for the flip-flop, which should be 1 in the starting state.
Eg.: consider eq.1and fig.2 for the main dice game control
Fig.2: SM chart for serially linked state machine
The next state equation for Q0 is Q0+=Q0Dn_roll’+Q2 Reset+Q3 Reset


If the system is reset to state 0000 after power-up, then add the term Q0’Q1’Q2’Q3’ to the equation for Q0+. Which then changes after the first clock to 1000 (T0) which is the correct starting state. In general both assignment with a minimum number of state variables and a one-hot assignment have to be tried to see which one leads to a design with the smallest number of logic cells. Based on the requirement the choice has to be made i.e., for faster speed choose the faster design. When a one-hot assignment is used more next state equations are required but in general both next state and output equations will contain fewer variables, and hence requires fewer logic cells to realize the equation. Equations with fewer variables require a single cell but for six variables require cascading two cells, for 7 variables require three cascading cells. As more cells are cascaded, the propagation delay increases and the operation will be slow. 

DESIGNING WITH PROGRAMMABLE GATE ARRAYS AND COMPLEX PROGRAMMABLE LOGIC DEVICES

UNIT VII
DESIGNING WITH PROGRAMMABLE GATE ARRAYS AND COMPLEX
PROGRAMMABLE LOGIC DEVICES:

Xilinx 3000 Series FPGAs, Designing with FPGAs, Using a One-Hot State Assignment, Altera Complex Programmable Logic Devices (CPLDs), Altera FLEX 10K Series CPLDs.

UNIT VII: DESIGNING WITH PROGRAMMABLE GATE ARRAYS AND COMPLEX
PROGRAMMABLE LOGIC DEVICES

Xilinx 3000 series FPGAs
Consider the part of the basic structure of xilinx XC3020 Logic cell array (LCA), which consists of an interior array of 64 Configurable Logic Blocks (CLBs) surrounded by a ring of 64 input-output interface blocks. The interconnections between these blocks can be programmed by storing data in internal configuration memory cells. Each CLB contains some combinational logic and two D-FlipFlops and can be programmed to perform a variety of logic functions.
The configuration memory cells are programmed after power is applied to LCA and the programmed logic functions and interconnections are retained until the power is turned off.
Fig. 1: Layout of part of a programmable logic cell array
During configuration, each memory cell is selected in turn. When a WRITE signal is applied to the pass transistor, DATA is stored in the cell. Each connection point in the LCA has an associated memory cell, and the data stored in that cell determines whether the connection is made or not.
Fig.2: Configuration Memory Cell
Fig.3: Xilinx 3000 Series Logic Cell
A CLB has five logic inputs(A,B,C,D,E), a data input (DI), a clock input (K), a clock enable (EC), a direct reset (RD), and two outputs (X and Y). the trapezoidal blocks on the diagram represent multiplexers, which can be programmed to select one of the inputs. Eg.: the X output can either come from the upper flip-flop (QX) or from the F output of the “Combinational Function” block. Similarly, the Y output can come either from the lower flip-flop (QY) or from G. Each M represents a configuration memory cell, and the data in the cell determines which mux input is selected.
The combinational function block contains RAM memory cells and can be programmed to realize any five variable function or any two functions of four variables. The functions are stored in truth table format, so the number of gates required to realize the functions is not important.
The block can be operated in three different modes based on the mux programmed to select one of its inputs.
1.   The FG mode generates two functions of four variables each. A is one variable common to both the functions and the next two variables can be chosen from B,C,QX and QY and finally the remaining variable can be chosen from D or E.
Eg. If F = A B’ + QX E and G = A’ C + QY D, then if QX and QY are not used then the two four variable functions must have A,B,C in common and the fourth variable can be D or E.
Fig.4: Combinational Logic Options
The F mode can generate one function of five variables (A,D,E, and two variables chosen from B, C, QX, and QY). The functions can be realized ranging in complexity from a simple AND gate, F=G=ABCDE to a parity function, F=G= A xor B xor C xor D xor E which has 16 terms when expanded to a sum of products.
The FGM mode uses a multiplexer with E as a control input to select one of two four-variable functions. Each function uses inputs A, D, and two of the inputs B, C, QX, and QY. The FGM mode can realize some functions of six or seven variables. Eg.: this mode could realize the seven-variable function
F=G=E(AB’+QXD) + E’(A’C+QYD)
The D input on each flip-flop can be programmed to come from F, G, or the DI data input. The two flip-flops have a common clock. The MUX connected to the CLOCK input (K) can be programmed to select either K or K’, so the D flip-flops will change state either on the rising or falling edge of the clock. The clock is either always enabled, or it is controlled by the Enable Clock (EC) input. The MUX connected to the D input of each flip-flop is used to effectively disable the clock. If EC=0, the Q output is fed back to the D input so that Q+=Q, and the flip-flop never changes states even though the clock is changing. If EC=1, the D input is connected to F, G, or DIN, and state changes occur in response to the clock. The D flip-flop and MUX combination is equivalent to a D flip-flop with an enable clock (EC) input as shown in fig.5. since Q can change only when EC=1, the following characteristic equation describes the flip-flop behaviour
Q+ = EC D + EC’ Q
Using this type of flip-flop makes it unnecessary to gate the clock with a control signal. Since the clock can go directly to each flip-flop input, achieving proper synchronous operation is much easier. The flip-flops have an active high asynchronous reset (RD). The direct reset input (if it is not inhibited) will clear both flip-flops when it is set to 1. The global reset will clear the flip-flops in all of the cells in the array.
Fig.5: Flip-Flops with Clock Enable
Eg.: implement a parallel adder-subtracter with an accumulator using an XC3020. The overall structure is similar to fig.6, except control signals are needed for both add and subtract.
Fig.6: Parallel adder with accumulator
If Ad=1, the B input will be added to the accumulator. If Su=1, the B input will be subtracted from the accumulator. Subtraction will be accomplished by adding the 2’s complement of B to the accumulator. If Ad=Su=0, the accumulator should remain unchanged. (i.e., 1’s complement +1 for B)
Since each logic cell has two flip-flops, it might be possible to implement two bits of the accumulator in one cell. If two bits of the adder-subtracter were implemented in one cell, two outputs from the accumulator flip-flops plus a carry output to the next cell would be required. Since each cell has only two outputs, this scheme would not work. Therefore, we can implement only one bit per cell.
Fig.7: Parallel Adder-Subtractor Logic Cell
Fig.7 shows a typical cell of the parallel adder-subtracter where the logic inputs are bici (carry from previous cell), and Su. The accumulator flip-flop output (ai) is fed back internally within the cell. The combinatorial function block implements the following equations:
F=sum=ai+ = ai xor (bi xor Su) xor ci
G=ci+1=carry out = aici+(ai + ci)(bi xor Su)
If Su=0, these equations reduce to the standard equations for a full adder. If Su=1, bi is complemented by the XOR. If the carry-in to the LSB is also connected to Su, when Su=1 the 2’s complement of B is added to A, so that subtraction will occur. Since both F and G are 4-variable functions of the same variables, which can be implemented by the combinatorial function block using the FG mode in fig.4. In fig.7, ci and bi are connected to the A and B block inputs, so the internal feedback from the ai flip-flop (QX) must be routed to the third block input. Then the remaining input, Su, can be connected to block input D or E. since the accumulator should only change when Ad=1 or Su=1, we connect the clock enable (EC) to the signal Ad+Su. An OR gate in another logic cell generates this signal, which is used by all of the adder-subtracter cells.
Fig.8: Signal Paths within Adder-Subtractor Logic Cell
The dashed lines in fig.8 indicate the relevant signal paths that are present within the logic cell after it has been programmed. The F function is connected to the D input of the accumulator flip-flop (ai) and the G function is connected to the carry out (Ci+1).
Input-Output Blocks:
Fig.9 shows a configurable IOB, the I/O pad connects to one of the pins on the IC package so that external signals can be input to or output from the array of logic cells. To use the cell as an input, the 3-state control must be set to place the tristate buffer, which drives the output pin, in the high-impedance state. To use the cell as an output, the tristate buffer must be enabled. Flip-flops are provided so that input and output values can be stored within IOB. The flip-flops are bypassed when direct input or output is desired. Two clock lines (CK1 and CK2) can be programmed to connect to either flip-flop. The input flip-flop can be programmed to act as an edge-triggered D flip-flop or as a transparent latch. Even if the I/O pin is not used, the I/O flip-flops can still be used to store data.
Fig.9: Xilinx 3000 series I/O Block
An OUT signal coming from the logic array first goes through an exclusive-OR gate. Where it is either complemented or not, depending on how the OUT-INVERT bit is programmed. The OUT signal can be stored in the flip-flop if desired. Depending on how the OUTPUT-SELECT bit is programmed, either the OUT signal or the flip-flop output goes to the output buffer.
If the 3-STATE signal is 1 and the 3-STATE INVERT bit is 0 (or if the 3-STATE signal is 0 and the 3-STATE INVERT bit is 1), the output buffer has a high-impedance output. Otherwise, the buffer drives the output signal to the I/O pad. When I/O pad is used as an input, the output buffer must be in the high-impedance state. An external signal coming into the I/O pad goes through a buffer and then to the input of a D flip-flop. The buffer output provides a DIRECT IN signal to the logic array. Alternatively, the input signal can be stored in the D flip-flop, which provides the REGISTERED IN signal to the logic array.
Each IOB has a number of I/O options, which can be selected by configuration memory cells. The input threshold can be programmed to respond to either TTL or CMOS signal levels. The SLEW RATE bit controls the rate at which the output signal can change. When the output drives an external device, reduction of the slew rate is desirable to reduce the induced noise that can occur when the output changes rapidly. When the PASSIVE PULL-UP bit is set, a pull-up resistor is connected to the I/O pad. This internal pull-up resistor can be used to avoid floating inputs.
Programmable Interconnects
The programmable interconnections between the configurable logic blocks and I/O blocks can be made several ways – general-purpose interconnects, direct interconnects, and long lines.
Fig.10: General Purpose Interconnects
Fig.10 shows general-purpose interconnect system. Signals between CLBs or between CLBs and IOBs can be routed through switch matrices as they travel along the horizontal and vertical interconnect lines.
Direct connection of adjacent CLBs is possible as shown in fig.11.
Fig.11: Direct Interconnection Between Adjacent CLBs
Long lines are provided to connect CLBs that are far apart. All the interconnections are programmed by storing bits in internal configuration memory cells within the LCA. Long lines provide for high fan-out, low-skew distribution of siganls that must travel a relatively long distance.
Fig.12: Vertical and Horizontal Long Lines
From fig.12, there are four vertical long lines between each pair of adjacent columns of CLBs, and two of these can be used for clocks. There are two horizontal long lines between each pair of adjacent rows of CLBs. The long lines span the entire length or width of the interconnection area.
Each logic cell has two adjacent tristate buffers that connect to horizontal long lines. Designers can use these long lines and buffers to implement tristate busses.
(a). Multiplexer Implementation
(b). Wired-AND Implementation
Fig.13: Uses of Tristate Buffers
Fig.13(a) shows how tristate buffers can be used to multiplex signals onto a horizontal long line. These buffers have an active-low output enable, so when A=0, DA is driven onto the line. A weak keeper circuit at the end of the line remembers the last value driven onto the line, so it is never left floating. Care must be taken to avoid bus contention, which would occur if both a 0 and 1 were driven onto the bus at the same time.
The tristate buffers can also be used to implement a wired-AND function, as shown in fig.13(b). When one or more of the D inputs is 0, the line is driven to 0. When all the D inputs are 1, all the buffer outputs are high-Z, and the pull-up resistor pulls the line up to a 1.
Fig.14: Crystal Oscillator
A crystal oscillator may be implemented using an internal high-speed inverting buffer together with an external crystal (Y1), resistors, and capacitors, as shown in fig.14. The external components connect to two of the IOB pins, and the oscillator output connects to the alternate clock buffer. The alternate clock buffer drives a horizontal long line, which in turn can be used to drive vertical long lines and the K (clock) inputs to the logic blocks. If an external clock is used, it can be connected to the global clock buffer. This buffer drives a global network, which provides a high fan-out, synchronized clock to all the IOBs and logic blocks. If a symmetric clock is required, the oscillator output can be routed through a flip-flop that divides the frequency by 2.
The XC3020 FPGA, has 64 CLBs (8 X 8), 64 user I/Os, 256 flip-flops (128 in the CLBs and 128 in the IOBs), 16 horizontal long lines, and 14,779 configuration data bits. Other members of the XC3000 family have up to 484 CLBs (22 X 22), 176 user I/Os, 1320 flip-flops, 44 horizontal long lines, and 94,984 configuration data bits.
Designing with FPGAs
Sophisticated CAD tools are available to assist with the design of systems using PGAs.
The following steps are used to design a digital system with a FPGA
1.   Draw a block diagram of the digital system. Define condition and control signals and construct SM charts or state graphs that describe the required sequence of operations.
2.   Write a Verilog module to describe the system. Simulate and debug the verilog code and make necessary corrections to the design developed in step.1
3.   Work out the detailed logic design of the system using gates, flip-flops, registers, counters, adders, etc.
4.   Enter a logic diagram of the system into the computer using a schematic capture program. Simulate and debug the logic diagram, and make any necessary corrections to the design of step3.
5.   Run a partitioning program. This program will break the logic diagram into pieces that fit into the configurable logic blocks.
6.   Run an automatic place and route program, which places the logic blocks in appropriate places in the FPGA and then route the interconnections between the logic blocks.
7.   Run a program that will generate the bit pattern necessary to program the FPGA.
8.   Download the bit pattern into the internal configuration memory cells in the FPGA and test the operation of the FPGA.
Automatic Synthesis tools take a verilog HDL to describe a system as a input and generate an interconnection of gates and flip-flops to realize the system. When the final system is built, the bit pattern for programming the FPGA is normally stored in an EPROM and automatically loaded into the FPGA when the power is turned on. When the final system is built, the bit pattern for programming the FPGA is normally stored in an EPROM and automatically loaded into the FPGA when the power is turned on. The EPROM is connected to the FPGA as shown in fig.1. The FPGA resets itself after the power has been applied. Then it reads the configuration data from the EPROM by supplying a sequence of addresses to the EPROM inputs and storing the EPROM output data in the FPGA internal configuration memory cells.
Fig.1: EPROM Connections for LCA Initialization
Fig.2: Block Diagram of Dice Game
The dice game can be implemented using an XC3020 LCA.
Fig.3: Dice Game block Diagram after entered using ViewDraw CAD Software.
The heavy lines indicate busses, which connect some of the modules. The IPADs and OPADs represent the input and output pins on the XC3020. The output buffers (OBUFs) drive external LEDs to indicate the state of each counter and WIN or LOSE. IPADs P12 and P14 are connected to an external RC network, which together with the GOSC module forms an RC clock. This oscillator drives all the CLK inputs on the LCA through the ACLK buffer. External Push buttons connected to P11 and P16 are used for the GAME_RESET and the roll button, respectively. The roll button is connected to two D Flip-flops, which serve to debounce the signal from the push button and synchronize it with the clock. The dice_controller module based on the SM charts of fig.4 can be deisgned.
Fig.4: SM Chart for the dice_controller
The main control has four states and requires two flip-flops. Using the state assignment T0: AB=00, T1: AB=01, T2: AB=10, T3: AB=11, the simplified logic equations derived from the SM Chart are
A+=A’B’ Dn_roll D711+A’B’ Dn_roll D2312 + A’B Dn_roll Eq + A’B Dn_roll D7 + A Reset’
B+=A’B’ Dn_roll D711’ + A’B Dn_roll’ + A’B Eq’ + AB Reset’
Win=AB’
Lose=AB
En_roll=A’
Sp=A’B’ Dn_roll D711’ D2312’
The roll control requires one flip-flop, and by inspection of its SM Chart,
Q+=Q’En_roll Rb + Q Rb
Roll=Q Rb
Dn_roll=Q Rb’
Since En_roll is always ‘1’ in S0, we can rewrite the equation for Q+ as
Q+=Q’ En_roll Rb + Q En_roll Rb = En_roll Rb
Fig.5 shows a ViewDraw schematic that implements above equations
Fig.5(a): Main Controller
Fig.5(b): Dice roll controller
Fig.5: Dice Game Controller Module
The logic equations for the modulo-6 counter are
Q2+ = Q1Q0+Q2Q1
Q1+=Q2’Q0’+Q1’Q0
Q0+=Q0
The 3-bit counter module is implemented as shown in fig.6. the CHIP_ENABLE connects to CE on the flip-flops, so the counter increments only when CHIP_ENABLE=1.
Fig.6: Modulo-6 Counter
After the final design has been partitioned into logic cells, the logic cells placed and the connections routed, 29 out of the 64 logic cells on the 3020 are used to implement the dice game. Fig.7 shows the final routing of the interconnections for the dice game.
Fig.7: Layout and Routing for Dice Game for XC3020
Realizing Functions with six or More Variables:
Although some 6-variable logic functions can be realized with one or two logic cells, a general 6-variable function may require three cells.
A method to describe a general method for realizing any 6-variable function.
1.   Expand the function as
Z(a,b,c,d,e,f) = a’Z(0,b,c,d,e,f) + aZ(1,b,c,d,e,f) =a’Z0 +aZ1
It is an example of Shannon’s Expansion theorem, we can verify it by setting a=0 and a=1 on both sides. If the equation satisfies then the expansion is correct.
This equation directly leads to the network of fig.8(a), which uses two cells to realize Z0 and Z1.
Fig.8(a): General 6-variable function
Fig.8: Realizing 6- and 7- variable functions
Half of a third cell is used to realize the 3-variable function, Z=aZ0+aZ1.
Eg: Consider the function,
Z= abcd’ef’ + a’b’c’def’ + b’cde’f
Setting a=0 gives Z0=0·bcd’ef’ + 1·b’c’def’ + b’cdef’ = b’c’def’ + b’cde’f
And setting a=1 gives Z1=1·bcd’ef’ + 0·b’c’def’ + b’cde’f = bcd’ef’ + b’cde’f
Since Z0 and Z1 are 5-variable functions, each of them can be realized by a single cell.
Any 7-variable function can be realized with 6 or fewer logic cells. The expansion for a general 7-variable function is
Z(a,b,c,d,e,f,g) = a’b’ Z(0,0,c,d,e,f,g)  + a’b Z(0,1,c,d,e,f,g) + ab’ Z(1,0,c,d,e,f,g) + ab Z(1,1,c,d,e,f,g)
                        = a’b’Y0 + a’b Y1 + ab’ Y2 + ab Y3
The above equation can be obtained by applying the expansion theorem twice, first expanding about a and then expanding about b.
Eg.: consider the 7-variable function
Z=c’de’fg + bcd’e’fg’ + a’c’def’g + a’b’d’ef’g’ + ab’defg’
By substituting a=b=0 gives Y0 = c’de’fg + c’def’g + d’ef’g’
By substituting a=0 and b=1 gives Y1= c’de’fg + cd’e’fg’ + c’def’g
By substituting a=1 and b=0 gives Y2= c’de’fg + defg’
By substituting a=b=1 gives Y3 = c’de’fg + cd’e’fg’
Fig.8(b): General 7-variable function
The above equations are implemented as a network shown in fig.7(b). Four cells implement the 5-variable functions - Y0,Y1,Y2, and Y3. A fifth cell implements the 4-variable function, Z0 = a’b’ Y0 + a’b Y1 and the remaining cell implements a 5-variable function, Z = Z0 + ab’ Y2 + ab Y3.
As the number of variables (n) increases, the maximum number of logic cells required to realize an n-variable function increases rapidly. Hence, PLDs may be a better solution than LCAs when n is large.
Using a One-Hot State Assignment
As in PGAs, each logic cell contains two flip-flops, minimization of number of flip-flops is necessary for any design. So that the total number of logic cells used and the interconnections between the cells can be reduced. A one-hot state design can be used to design faster logic i.e., the number of cells required to realize each equation can be reduced.

It uses one flip-flop for each state => a state machine with N states requires N flip-flops. Eg.: a system with four states (T0,T1,T2 and T3) can use four flip-flops (Q0,Q1,Q2 and Q3) with the following state assignment:

T0: Q0Q1Q2Q3=1000, T1:0100, T2:0010, T3:0001
ð  The other 12 combinations are not used
ð  The next state and output equations can be written by inspection of the state graph or by tracing link paths on an SM chart.
Fig.1: Partial State Graph

Consider the partial state graph shown in fig.1, the next state equation for flip-flop Q3 can be written as
Q3+=X1Q0Q1’Q2’Q3’+X2Q0’Q1Q2’Q3’+X3Q0’Q1’Q2Q3’+X4Q0’Q1’Q2’Q3

Since Q0=1 => Q1=Q2=Q3=0 then the term Q1’Q2’Q3’ term is redundant and hence can be neglected. Then using similar case all the primed state variables can be eliminated from the other terms, then the next-state equation reduces to

Q3+=X1Q0+X2Q1+X3Q2+X4Q3
ð  Each term has exactly one state variable.
ð  Similarly each output equation contains exactly one state variable
Z1=X1Q0+X3Q2   Z2=X2Q1+X4Q3

When a one-hot assignment is used, the next-state equation for each flip-flop will contain one term for each arc (or link path) leading into the corresponding state. Hence in general, each term in every next-state equation and in every output equation will contain exactly one state variable. For asynchronous network additionally a “holding term” is required for each next state equation.
Method-1:
For one-hot assignment, resetting the system requires that one flip-flop be set to 1 instead of resetting all flip-flops to 0. If the flip-flops used do not have a preset input then we can replace Q0 with Q0’ throughout. Hence the changes in assignment will be
T0: Q0Q1Q2Q3=0000, T1:1100, T2:1010, T3:1001

And the modified equations are
Q3+=X1Q0’+X2Q1+X3Q2+X4Q3
Z1=X1Q0’+X3Q2
Z2=X2Q1+X4Q3

Method-2:
To solve the reset problem without modifying the on-hot assignment, add an extra term to the equation for the flip-flop, which should be 1 in the starting state.
Eg.: consider eq.1and fig.2 for the main dice game control
Fig.2: SM chart for serially linked state machine
The next state equation for Q0 is Q0+=Q0Dn_roll’+Q2 Reset+Q3 Reset

If the system is reset to state 0000 after power-up, then add the term Q0’Q1’Q2’Q3’ to the equation for Q0+. Which then changes after the first clock to 1000 (T0) which is the correct starting state. In general both assignment with a minimum number of state variables and a one-hot assignment have to be tried to see which one leads to a design with the smallest number of logic cells. Based on the requirement the choice has to be made i.e., for faster speed choose the faster design. When a one-hot assignment is used more next state equations are required but in general both next state and output equations will contain fewer variables, and hence requires fewer logic cells to realize the equation. Equations with fewer variables require a single cell but for six variables require cascading two cells, for 7 variables require three cascading cells. As more cells are cascaded, the propagation delay increases and the operation will be slow.

Altera Complex Programmable Logic Devices (CPLDs)

CPLDs are an extension of the PAL concept, it is an IC that consists of a number of PAL-like logic blocks together with a programmable interconnect Matrix. Each PAL-like logic blocks together with a programmable AND array that feeds macrocells, and the outputs of these macrocells can be routed to the inputs of other logic blocks within the same IC. CPLDs can be electrically erasable and reprogrammablesuch as EPLDs (Erasable PLDs).
Features of the Altera MAX 7000 series
Ø  a family of high performance CMOS CPLDs.
Ø  Uses EEPROM-based configuration memory cells, => once the configuration is programmed, it is retained until it is erased.
Fig.1: ALtera 7000 seires Architecture for EPM7032, 7064, and 7096 Devices
The basic 7000 series architecture consists of a number of Logic Array Blocks (LABs), I/O Control Blocks, and a Programmable Interconnect Matrix (PIA). Each LAB contains 16 macrocells, each of which contains combinational logic and a flip-flop. Each LAB has 36 inputs from the PIA and 16 outputs to the PIA. From 8 to 16 outputs from each LAB can be routed to the I/O pins through the I/O control block to the PIA. The global clock input (GCLK) and the global clear input (GCLRn) connect to all macrocells. Two output enable signals (OE1n and OE2n) connect to all I/O control Blocks.
Fig.2: Macrocell for EPM7032, 7064, and 7096 Devices
Each macrocell includes a logic array, a product-term select matrix that feeds an OR gate, and a programmable flip-flop. The vertical lines in the logic array, are common to all of the macrocells in a LAB, are driven with the programmable interconnect signals from the PIA and from shared logic expanders. Product terms are formed in the logic array just as they are in a PAL. Five product terms are provided in each macrocell, and these product terms are allocated by the product term select matrix. A product term may be used as an OR gate input, an XOR gate input, a logic expander, or as a flip-flop preset, clear, clock, or enable input.
The flip-flop in each macrocell is a D flip-flop with clock enable and asynchronous preset and clear. The clock input can be driven by the global clock or from a product term. The clock enable can be driven from a product term or Vcc (always enabled). The clear can be driven from the global clear or from a product term. The preset can be driven from a product term. The D input always comes from the XOR gate output. Either the flip-flop Q output or the XOR gate output can be selected by the Register Bypass Multiplexer. The selected output goes to the PIA and to the I/O control Block. The D flip-flop in a cell can be converted to T flip-flop using the XOR gate. Since the characteristic equation for a T flip-flop is Q+=Q XOR T, using the T flip-flops to implement counters and adders often requires fewer gates than using D flip-flops.





Since five product terms are avilable in each macrocell, more complex functions can be implemented by utilizing unused product terms from other macrocells. Two types of expanders are – sharable and parallel expanders.
Fig.3: Sharable Expanders
From fig.3, the selected product term is fed back to the logic array through an inverter and hence the inverted product term can be used as an input to any macrocell AND gate. When sharable expanders are used, the realization is equivalent to a three level NAND-AND-OR network. An AND-OR logic expression with more than five terms can often be factored to utilize sharable expanders from other macrocells.
Eg.: P=AB+B’C+C’D+E’F+E’G+E’H+F’I+F’J = AB+B’C+C’D+E’(F’G’H’)’+F’(I’J’)’
 Can use shareable expanders to generate (F’G’H’)’ and (I’J’)’. The XOR gate in a cell can be used to complement a function, since F=F’ XOR 1. Sometimes the complement of a function (F’) requires fewer terms than the original function (F), => it is more economical to implement F’ and complement it using the XOR gate.

Fig.4: Parallel Expanders
The parallel expanders as shown in fig.4 allow unused product terms from a macrocell to be used in a neighboring macrocell. The parallel expander product terms can be chained from one macrocell to the next within two groups- macrocells 8 downto 1 and 16 downto 9. When parallel expanders are used without shareable expanders, the maximum number of product terms in any logic function is 20, five from the macrocell itself, and three additional groups of five changed from neighboring macrocells.
Fig.5: I/O Control Block for EPM 7032, 7064, and 7096
Fig.5 shows an I/O control block for an I/O pin, which allows each I/O pin to be configured as an input, output, or bidirectional pin. A tristate buffer drives the I/O pin. The OE control mux is programmed to select either Vcc, Gnd, or one of the global output enable signals. If Vcc is selected, the macrocell output is enabled to the I/O pin. If Gnd is selected, the buffer is disabled and the I/O pin can be used as an input. Otherwise, the buffer is controlled by OE1n or OE2n.
Software provided by Altera can be used to optimize and partition a design to fit it into logic cells and route the connections between the cells.
Fig.6: Parallel adder with accumulator
Eg.: if we use the Altera software to implement two bits of the full adder of  the fig.6 using the following equations
The software first determinesthat T flip-flops will require fewer gates than D flip-flops and then it factors the equations to utilize shareable expanders. The resultant equations are
C3=A1 A2 X01 + B1 C1 X02 + A2 B2
Where X01 = B1 + C1 and X02 = A2 + B2 are shareable expanders outputs
T2 = Ad A1 B2’ C1 + Ad B1 B2’ X03 + Ad B1’ B2 X04 + Ad A1’ B2 C1’
Where X03 = A1 + C1 and X04 = A1’ + C1’ are shareable expanders outputs
T1= Ad B1 C1’ + Ad B1’ C1
Each logic equation has less than or equal to five terms, so it fits in a logic cell. Implementing these equations requires three logic cells and four shareable expanders.
For the dice game, the schematics are entered and compiled for the design, the resulting equations require 23 logic cells and 8 shareable expanders, which fit into an EPM7032 CPLD.
Altera manufactures several other series of CPLDs. The MAX 7000S series is similar to the MAX 7000 series, except that it is in-circuit programmable rather than requiring a programmer. The MAX 9000 series, which is an enhanced version of the MAX 7000S series, has higher density and additional routing resources. The FLEX 8000 and FLEX 10K series use RAM-Based configuration memory cells instead of EEPROM-based cells.
ALTERA FLEX 10K SERIES CPLDs
The Altera FLEX 10K embedded programmable logic family provides high-density logic along with RAM memory in each device. The logic and interconnections are programmed using configuration RAM cells similar to that of xilinx FPGAs.
Fig.1 shows the block diagram of a FLEX 10K device where each row of the logic array contains several logic array blocks (LABs) and an embedded array block (EAB). Each LAB contains eight logic elements and a local interconnect channel. The EAB Contains 2048 bits of RAM memory. The LABs and EABs can be interconnected through fast row and column interconnect channels, called as Fast Track Interconnect. Each inpu-out element (IOE) can be used as an input, output, or bidirectional pin. Each IOE contains a bidirectional buffer and a flip-flop that can be used to store either input or output data. A single FLEX 10K device provides from 72 to 624 LABs, 3 to 12 EABs, and up to 406 IOEs. It can utilize from 10,000 to 100,000 equivalent gates in a typical application.
Fig.34: FLEX 10K Device Block Diagram
The block diagram of FLEX 10K LAB is shown in fig.35, contains 8 logic elements (LEs). The local interconnect channel has 22 or more inputs from the row interconnect and 8 input fed back from the LE outputs. Each LE has four data inputs from the local interconnect channel as well as additional control inputs. The LE outputs can be routed to the row or column interconnects, and connections can also be made between the row and column interconnects.
Fig.35: FLEX 10K Logic Array Block
Fig.36: FLEX 10K Logic Element
Each logic element contains a function generator that can implement any function of four variables using a lookup table (LUT). A cascade chain provides connections to adjacent Les so functions of more than four variables can be implemented. The cascade chain can be used in an AND or in an OR configuration as shown in fig.37.
Fig.37: Cascade Chain Operation
When used in the arithmetic mode, an LE can implement the sum and carry for one bit of a full adder. The carry chain provides for propagation of carries between adjacent cells. Each LE contains one D flip-flop with a clock enable and asynchronous clear and preset inputs. The LE output can come from the flip-flop or directly from the combinational logic.
Functions of more than four variables require multiple LEs for implementation. Eg., consider a six variable function Z(a,b,c,d,e,f) which can be implemented using six LEs.
By applying the expansion theorem,
Z(a,b,c,d,e,f)=a’b’Z0(c,d,e,f)+a’bZ1(c,d,e,f)+ab’ Z2(c,d,e,f)+abZ3(c,d,e,f)
Z0,Z1,Z2 and Z3 can each be implemented with an LE. The outputs of these LEs can be connected to inputs of other LEs via the local interconnect. The 4-variable functions Y0=a’b’Z0+a’bZ1 and Y1=ab’Z2+abZ3 each require another LE. Y0 can be ORed with Y1 using the cascade chain, so no additional LE is required.
Fig.38: FLEX 10K Embedded Array Block
Fig.38 shows an embedded array block. The inputs from the row interconnect go through the EAB local interconnect and can be utilized as data inputs or address inputs to the EAB. The internal memory array can be used as a RAM or ROM of size 256×8, 512×4,1024×2, or 2048×1. Several EABs can be used together to form a larger memory. The memory data outputs can be routed to either the row or column interconnects. All memory inputs and outputs are connected to registers so the memory can be operated in a synchronous mode. Alternatively, the registers can be bypassed and the memory operated asynchronously.

Use of CPLDs such as the FLEX 10K series allows us to implement a complex digital system using a single IC.