Abstract:
This work depends on two stages. First one, "subtractive method", clustering algorithm, used for identifying the relationships between data points in order to build system, where the data point gathers with other points to make cluster of the same features. These groups will be used in the second part of the work to construct fuzzy IF…THEN rules, which controls how the system works.
The number of rules and its parts depend on these clusters. While the Takagi-Sugeno-Kang (TSK) fuzzy inference modal was used. The scope of this work is applied to heart disease diagnosis.
ا
Key word: Fuzzy rules, First_order (TSK) fuzzy inference, Subtractive clustering, Heart disease diagnoses.
1. Introduction:
Fuzzy system is an interesting way to deal with data and with any type of features even there are undesirable features like imprecise, noisy or missing input information1, also with complex system since it depends on input-output high dimensional data rather than human experience, which maybe uncertain or unavailable, or complex mathematical equations to describe the manner of the system and build its rules 2.
In general, clustering (hard clustering) concept is each data point is belonging to only one cluster according to some measures, while in fuzzy clustering (soft clustering) the data points may belong to more than one group, or to all groups in some ratio according to membership degree, and the cluster with high value, the data point is assigned .
There are many clustering algorithms are K-mean, C-mean, Mountain and Subtractive clustering. These algorithms belong to hard or soft clustering, but each one has its limitations like initial cluster center selection, pre-knowledge of number of clusters, dead unit problem, multiple cluster membership and premature convergence to local optima 3.
Subtractive clustering algorithm is a suitable way to solve these limitations because it doesn t need to know the number of cluster or estimate the cluster center. It depends on some parameters that influence the result like radius, squash ratio, accept ratio, reject ratio 1.After finding all the clusters in the data, the work is going toward generation rules. The number of rules is equal to the number of clusters. The antecedent of rule takes from clusters data (input data) and the consequent part of the rule represents the behavior of the system 4 ,this is according to Takagi-Sugeno-Kang (TSK) fuzzy inference which may be a linear function and this type is called first-order Sugeno fuzzy mode, or zero-order Sugeno fuzzy model if that consequent is constant5.
2. The Previous Work
Diagnose heart disease is sensitive and important. Earlier diagnose of this disease is one of the treatment ways and the correct diagnoses leads to save the patient s life. One choice of this diagnoses is usually done in the hospital by the doctors depending on the taking information from the patient and making test. Recent science techniques are used. Fuzzy logic can help successfully in this space such as Fuzzy Petri Net, one of these techniques. This used petri net to evaluate the patient case to determine if he is ill and the degree of the illness. Genetic algorithm is also used to generate fuzzy rules. Particle Swarm Optimization (PSO) algorithm is another way used in fuzzy rule extraction space. Fuzzy clustering, which is used in this paper can build rules base to determine the infected according to the symptoms of the sick person.
3. The Proposed Subtractive Clustering algorithm
The steps of Subtractive clustering algorithm are:
Step1 Normalization: starting by handling multidimensional input data, N dimensions, M attributes , like Xi ={x1, x2,…, xm} where Xi is a vector of inputs, data is normalized to interval [0,1] ,by equation 16.
Where i=1,2,…,n; j=1,2,…,m.
Step2 Assigned: constant parameters are given carefully according to the work s modality and data s type, the parameters are radius, squash factor (?), accept ratio, and reject ratio (?). Radius ra (influences range of cluster center), affects the number of rules seriously, since the number of rules depends on number of clusters. So large radius value means few number of clusters and vice versa2,[6].
Step3: Compute center:
A-compute first potential: The subtractive clustering algorithm differs from other clustering algorithms; it is not interested in early knowing of number of clusters, therefore the algorithm works based on consideration of each data points as a cluster center 2
Thus the measure of density to each point is achieved and the potential is computed by equation 2 6,7.
Where
||.|| denotes the Euclidean distance
ra: is positive constant defined as cluster center range.
B-Compute first center: when the data point s potential is the biggest one (?1*) compared with the other potentials ( ?i1), first cluster center ?1* is declared 8,9.
C-Compute the other centers: trying to find the next center, subtract a potential of other data (?i*) from the potential of first cluster (?1*), if the difference is big, this means it is far from influence of center 1 and input value regarding with this potential is candidate to be next center. This is achieved by equation 36,8.
Where
?=4/rb2
rb=?ra
?= squash factor
D-Compute k-center: after k-time the equation above becomes as the equation below….1,6
Where
? k* =center of the kth cluster.
?k* = its potential value .
Step4-Testing: each data point is candidate to be cluster center and must be tested in some formula which specify it as a center point or not 1,8 These formulas are:
Formula 1-
Where
dmin=smallest distance from center i to all earlier found centers6.
If the data point with high potential realizes that condition, then it is successfully declared as a new center.
Formula 2-
Where
?=reject ratio
The algorithm is terminated if this condition is not satisfied 8
Many parameters are adapted in this algorithm these are:
Radius: a range of numbers found in the cluster of that center.
Accept ratio: gives its agreement to the candidate center to be center I, if condition 1 is true with that data points. Otherwise, potential of data point assign to zero and begin with another candidate.
Reject ratio: each cluster center yield (?k * ), its potential compared with the potential of center i, with the reject ratio (P*K < ?P*i). And the result determines continuous generation centers or stop and begin to show the result 2.
The algorithm can be summarized as:
4. Generation fuzzy rules:
A clusters are building. Depending on clustering algorithm results, the fuzzy rules then must be created. Number of rules associates with number of clusters .So the rule for class Ci can take the form: (Abonyi, Janos, Feerenc 2003), (Farias et al 2011).
So to be ready to generate fuzzy rules the formula above can be more conventional form 9:
Where:
Xj=input data element.
j=1…n, n is number of input.
i=1…m, m is number of cluster.
?ij=membership function related with each xj in rule i (linguistic variable).
The membership can be calculated by the equation below, where Gaussian function applies 10.
Where
?*ij = the center.
xj=the j element with xi input.
?ij= sigma value generated from subtractive clustering method.
If there is more than one feature to consist the individual rule then the connective tool, And, Or, Not, is used. And each one has its effects on the result 1.
And the degree of fulfillment of each rule also called firing strength is computed by using (Samsudin, Ruhidah) :
Wi is weight of the rule and in most time it is equal to 1.
The consequent of the rule that describes the system behavior, as mentioned before, and according to (TSK) fuzzy inference, is either polynomial equation in first order (TSK) fuzzy inference as
Rule i- if X is ?1 and Y is ?2 then A=SX+TY+V
Where x, y are input features.
Or be constant in zero order (TSK) fuzzy inference so the equation above becomes:
Rule i- if X is ?1 and Y is ?2 then A=V.
Where parameters S and T are equal to 0
5. Testing Rules:
Now rules base is formed from train input data set .by implementing of these rules, on test data set, and rule base is created before. One or more rule reacts to such case. After applying these set of rules and the results are computed according to connective tool, to tight the antecedent parts together, used. the output is acquire as the formula below 11:
Where
C=centroid
N: number of trigged rules.
Ai=minimum value of each rule.
Ci=result of each rule.
The figure below shows all previous steps of proposed system:
Figure 1:Sugeno fuzzy inference processing model
4. Experimental Results
Heart disease with its database can be applied in this system .By takeing 10 symptoms of the disease for 100 patients.
First, subtractive clustering algorithm is applied and clusters are built according to centers created. Giving radius, squash factor (?), accept ratio, and reject ratio (?) its value as 0.5, 0.25, 0.5, 0.15, respectively. So the results are: number of centers is 56 centers. As shown in figure (2).
Figure (2): the centers value of subtractive clustering algorithm.
And the sigma values, which their numbers are related to number of column (symptoms), are shown in figure 3:
Figure (3): the sigma values for subtractive clustering algorithm.
Sigma values are used in additional to the set of centers, to contribute the rule. The fuzzy rules formed depended on this set of centers and can be shown in the figure below.
Figure (4): set of fuzzy rules for heart disease.
By using Gaussian membership function which depend on centers and sigma s value and the using formula of
The final output is made. And it approximately can diagnose the case, where the accuracy rate =76.43
7. Conclusion
This paper has shown the act of a TSK fuzzy inference with using Subtractive clustering technique. This approach has been tested with database of heart disease in order to diagnose the case of the patient whether infected or not.
As shown from the result of this works that subtractive clustering algorithm and by using some parameters like radii, accept and reject ratio is preferred to build efficient fuzzy rule and give much betters result for many problems.
Reference
1. Keshavarzi A, Sarmadian F, Rahmani A, Ahmadi A, Labbafi R, Iqbal MA. Fuzzy clustering analysis for modeling of soil cation exchange capacity. 2012;3(1).
2. Berneti SM. Design of Fuzzy Subtractive Clustering Model using Particle Swarm Optimization for the Permeability Prediction of the Reservoir. 2011;29(11).
3. Ghorpade-aher J, Metre VA. Clustering Multidimensional Data with PSO based Algorithm. 2014.
4. Engineering M. Type-2 Takagi-Sugeno-Kang Fuzzy Logic Modeling using Subtractive Clustering. 2006.
5. Einstein A. Crisp sets. 1955.
6. Priyono a, Ridwan M, Alias A. Generation of fuzzy rules with subtractive clustering. J Teknol. 2005;43(D):143-153. http://www.jurnalteknologi.utm.my/index.php/jurnalteknologi/article/view/782.
7. Farahbod F, Eftekhari M. A NEW CLUSTERING-BASED APPROACH FOR MODELING FUZZY RULE-BASED CLASSIFICATION SYSTEMS. 2013.
8. Farias MS, Nedjah N, Mourelle LDM. RADIONUCLIDE IDENTIFICATION USING SUBTRACTIVE. 2011.
9. Chiu SL. An Efficient Method for Extracting Fuzzy Classification Rules from High Dimensional Data. 1997;1(1).
10. Abonyi J, Szeifert F. Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recognit Lett. 2003;24(14):2195-2207. doi:10.1016/S0167-8655(03)00047-3.
11. Samsudin R. Rice Yields Time Series Forecasting Using ANFIS. 2009.