аЯрЁБс>ўџ ;=ўџџџ:џџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџьЅСЉ@ №ПЈ(jbjbмЁмЁ C<ЖУЖУŒ"џџџџџџlRRR$Pn z$ЉfВВ"ддиCCCfhhhhhh, /И”RC?CCC”‡диgВ‡‡‡C"8дRиf‡v6ЌrCf‡p‡ї :о,:RfžЦ;Цe"  fЉЉRч‡чf‡УхForeseeing Small Company Delinquencies with the Naяve Bayesian Method Brian Solloway Computer Science and Engineering University of California, Irvine Abstract Credit companies are continually searching for a better method of predicting whether a company is a safe investment or if they’re likely to become delinquent in their payments. The Naяve Bayesian method attempts to find a pattern that does not presume relationships amongst past-recorded data. Using Microsoft Excel, this method becomes easy to implement and use. (** Further details to be added once more testing has been done regarding success / whether Naяve Bayesian method is recommended) Introduction Credit companies are continually searching for a better method of predicting whether a company is a safe investment or if they’re likely to become delinquent in their payments. While new methods are constantly being tried, our group attempted to create artificial intelligence that would find a pattern amongst companies that could be used to predict whether a company would become delinquent or not. Amongst the artificial intelligence methods out there, our group decided to focus on the Naяve Bayesian model, the K-Means Theorem, and Support Vector Machines. We then split the areas, with two members focusing on the K-Means Theorem, while Edward and I focused on the Naяve Bayesian Method. The Naяve Bayesian Method was chosen due to its simplicity; it does not assume that features of the data are related to each other. In this manner, the Naяve Bayesian Method can find patterns that might be missed by other methods that group and categorize examples based on features. This paper will cover the data provided and what constitutes a “delinquent” company, the method that we chose to implement the Naяve Bayesian Method, and the tests we used to measure the quality of our method and the results. Data Experian, one of the major credit companies, provided us with data and their definition as to what constituted as delinquent. The data consisted of several millions of companies, each with eight quarters of data sets, each quarter with twenty variables. Variables Each quarter had twenty variables. The following is a list of the variables: Total liability balance for bankruptcy filings Total number of bankruptcy filings Total number of bankruptcies filed within 24 months of the profile date Total number of open or Closed Unpaid collection trades Total number of open collection trades Total account balance for Combined trades Total number of combined trades Total number of combined trades with DBT greater than 60 Total number of combined trades with DBT greater than 90 DBT of combined trades Pct of Total Combined Balance to Combine Trades’ Total Highest Balance in past 12 months Total number of inquiries in the last three full months Total liability amount for original judgments filed Total number of original judgments filed Total number of legal filings (includes bankruptcies, tax liens, and judgments) Total number of newly reported trades Average monthly charge in DBT for regular trades over the six months prior to the profile date Total high credit balance Total liability amount for original tax liens filed Total number of original tax liens filed Delinquency Companies were classified as delinquent if they had a non-zero value for the #3, #5, and #9 variables. Implementation To implement a Naяve Bayesian model on the data, we chose to use a function given in VBA code for Microsoft Excel due to the commonality of Excel being used in the workplace and the ease it should bring for non-programmers to use. Our initial tests found that, while the function worked, it was not optimized to scale for large data sets. To solve this issue, we modified how we were reading in the data as well as optimizing the code. Naяve Bayesian Method The Naяve Bayesian Method calculates the probability of each outcome based on each variable. This method stands out amongst other methods due to it not making assumptions on some variables due to others; each variable is independent of the other variables. Due to this, the method only needs to keep track of the frequency of unique tags, as opposed to patterns amongst the variables. Original Code Method The original code was created by Eric Fortin. In the original, the Naяve Bayesian method took in a training set of examples, their results, and a single line to test and predict the result. This method, while accurate, proved to take to long with a small portion of the data given by Experian. The problem was due partially to the code, and partially to the data that we were giving it. The problem with the code was that it was recalculating the whole pattern found with each test case example. This meant reading in tens of thousands of data examples, calculating components to be used by a confidence formula, and then finding the result for a single test case example, which would be repeated over a million times. The manner of the data was also of a bad format with the code; the code would create components to every unique data type, in a set of data that is counting occurrences. This manner would only take cases that fell amongst the training data, missing sections in between points given, as well as having an excessive number of variable components. Categorizing the Data Edward came up with the method to categorize the data. As we were looking for patterns of good companies that became delinquent, he removed companies that were delinquent in the first year and had the second year shortened into a single variable that classified if the company became delinquent in the second year. For variables that had examples going beyond single digit, he categorized them by letters that represent ranges. These optimizations allowed the original code to run in a decent amount of time, actually finishing one test example under twenty minutes. Improving the Code To eliminate the repetition of recalculating every value in the training set, I broke the code into sections and created two parts, a sub-routine and a function. The sub-routine was used on the training data and training results to create a table consisting of how many unique tags each variable had, and how often each tag became delinquent or not. This sub-routine would need to be called only once. The function was of a similar manner to the original code in that it needed to be performed for each example in the test case set. However, this function was far simpler in that it looked up the frequency of the tags present in that test case example, formed confidences for both the company becoming delinquent or staying good based on those numbers, and then giving a response for which seemed more likely. In this manner, the calculation of frequency for each unique tag, which took the majority of the time for the original, was only done once, shaving off an excessive amount of time that would have otherwise been spent recalculating the numbers for each example in a test case that had over a million examples. Tests and Results Upon completion of categorizing the data and modifying the code, we tested this on a sample data of roughly fifty thousand training examples. We found that this method gave the answers very quickly, however lead to the need for more tests. Initial Our initial data was a random sampling from the large data set. This sampling contained only 92 examples of companies that became delinquent. With the table created, we tested it on the very data it was made from. While the result was 99.8% accurate, the result had listed every company as staying good; the inaccuracy was every company that had actually become delinquent. Planned (to come) (8 more cases desired: original was a training set biased towards good with a test on mostly good companies. I plan on testing with training data of 50:50 for good vs. delinquent companies as well as a set that is biased towards delinquent companies; each set of training tested on each set, resulting in a total of nine different tests total.) Conclusion With categorizing of data and some optimization of code, predicting whether companies will become delinquent or not using the Naяve Bayesian Method on Microsoft Excel can be done quickly, efficiently, and easily performed without a proficiency in programming. (More details on accuracy, preferred methods to train, and issues to come.) (Some basic info on how well K-Means worked) Originally, our group had planned on using Support Vector Machines to solve the problem in addition to our two implemented means. Due to time constraints, we had to put aside that method for now. Our group believes that Support Vector Machines would be a worthy investment for creating a method to predict whether companies will become delinquent. PAGE  PAGE 5 ›Є—Ѓ` e fgqХЦ;J›œБруљ56IВ"Г"Л"4$5$<$Ђ%Ќ%­%Œ((“(”(•(—(˜(ž(Ÿ( (Ё(Ђ(Ј(§љ§іђі§і№ь№і№і№і№і№љ§хтхтхтхнхт0JmH0J j0JU6CJ66CJCJ5CJ 5/FGVw˜™š›Є–—Ѕ^ | _ ` e fgqПюY‘Ињѕѕѕѕѓѓѓющщщщщщщющѓчщппппп & Fdhdhdh$a$dhFGVw˜™š›Є–—Ѕ^ | _ ` e fgqПюY‘Ит;t‹фPyЩ§ћјѕђяьщцујрнздЮЦОЖЎІž–ކ~vnf^юќџџ  §џџ  K§џџ  ƒ§џџ  м§џџ  ѓ§џџ  ,ўџџ  eўџџ  …ўџџ  Џўџџ  жўџџ  џџџ  Vџџџ  yџџџ   Јџџџ іџџџ љўџџњўџџћџџџ<љџџњџџ=ћџџі§џџўџџўџџїџџџ $Ит;t‹фPyЩяNhœХЦв:;J›œБїїїїїїїїїїїїїїїѕѓюющюѕюрѕю dh ЦрР!dhdh & FdhЩяNhœХЦв:;J›œБрстуљ56IЎ!Џ!С!В"Г"Л"4$5$G$Ё%Ђ%Ў%џ&-'Œ(—(Ђ(Ѓ(Ї(Ј(їячпздЮЫШХТПМЗДБЎЋЈЅЂŸœ™–“Š‡„~{yyywyyz§џџ{§џџєўџџќўџџ§ўџџюџџџ ŒёџџёџџђѕџџіџџіџџBјџџXјџџYјџџZјџџ[јџџŠќџџŸќџџ ќџџ#ўџџ9ўџџ:ўџџёџџџŒџџџєџџџ ЁћџџЂћџџЫћџџ  џћџџ  ќџџ  xќџџ  žќџџ  ,Брстуљ56IЎ!Џ!С!В"Г"Л"4$5$G$Ё%Ђ%Ў%џ&-'Œ(•(–(—(њјјјњњјњњњіњјњњјњњњњњњњэчј„h]„h„јџ„&`#$dh—(Ђ(Ѓ(Є(Ѕ(І(Ї(Ј(і№юююющdh„h]„h„јџ„&`#$Аа/ Ар=!А"А# $ %А# 0Аа/ Ар=!А"А# $ %А P i4@ёџ4NormalCJOJPJQJmH 6@6 Heading 1$dр@&CJ :@: Heading 2$dh@&5CJ 6@6 Heading 3$dh@&CJ6@6 Heading 4$dh@&6<A@ђџЁ<Default Paragraph Font,>@ђ,Title$a$5CJ(, @,Footer  ЦрР!&)@Ђ& Page NumberЈ"џџџџџџџџџџ›Ј" <џџџџ <џџџџ#џџ z™џџ z™џџ z™џџ z™џџ z™›…т Ц ›уC(џ Ј"}t  Ј(ИБ—(Ј(ЩЈ( !•!”џ•€№Ф№?№„b№$kУt№ј!иHЄNЏэ€џ;џџџџCX№$џCXB№$Е,‹8а@ЏЄ8p`IКшџ'-џџџџCX@ёџџџ€€€ї№’№№0№( № №№B №S №ПЫџ ?№Ј"€"Š"Œ"Ѕ"Љ"БнŒ"Ѕ"Љ":џџBIfill-in-name:Users:Brian:Desktop:Classes:Spring 09:CS 175:finalReport.docBLfill-in-name:Users:Brian:Documents:Microsoft User Data:Word Work File A_1958BVfill-in-name:Users:Brian:Documents:Microsoft User Data:AutoRecovery save of finalReporBVfill-in-name:Users:Brian:Documents:Microsoft User Data:AutoRecovery save of finalReporBVfill-in-name:Users:Brian:Documents:Microsoft User Data:AutoRecovery save of finalReporBVfill-in-name:Users:Brian:Documents:Microsoft User Data:AutoRecovery save of finalReporBLfill-in-name:Users:Brian:Documents:Microsoft User Data:Word Work File A_3222BIfill-in-name:Users:Brian:Desktop:Classes:Spring 09:CS 175:finalReport.docBLfill-in-name:Users:Brian:Documents:Microsoft User Data:Word Work File A_2326BIfill-in-name:Users:Brian:Desktop:Classes:Spring 09:CS 175:finalReport.docАbxruДЦџџџџџџџџџœe%+ruДЦџџџџџџџџџh„а„˜ўЦа^„а`„˜ў.h „ „˜ўЦ ^„ `„˜ўOJQJo(oh „p„˜ўЦp^„p`„˜ўOJQJo(Ї№h „@ „˜ўЦ@ ^„@ `„˜ўOJQJo(З№h „„˜ўЦ^„`„˜ўOJQJo(oh „р„˜ўЦр^„р`„˜ўOJQJo(Ї№h „А„˜ўЦА^„А`„˜ўOJQJo(З№h „€„˜ўЦ€^„€`„˜ўOJQJo(oh „P„˜ўЦP^„P`„˜ўOJQJo(Ї№h „а„˜ўЦа^„а`„˜ўOJQJo(З№h „ „˜ўЦ ^„ `„˜ўOJQJo(oh „p„˜ўЦp^„p`„˜ўOJQJo(Ї№h „@ „˜ўЦ@ ^„@ `„˜ўOJQJo(З№h „„˜ўЦ^„`„˜ўOJQJo(oh „р„˜ўЦр^„р`„˜ўOJQJo(Ї№h „А„˜ўЦА^„А`„˜ўOJQJo(З№h „€„˜ўЦ€^„€`„˜ўOJQJo(oh „P„˜ўЦP^„P`„˜ўOJQJo(Ї№œe%+Аbxџџџџџџџџџџџџ                  џ@€‹"‹"М^ џџ‹"д!Š#Ј"а @GTimes New Roman5€Symbol3 Arial3Times? Courier New;€Wingdings ёˆаhЄЂеf­е†-б 2БЅРДД€>0d?Gђ@џџEForeseeing Small Company Delinquencies with the Naяve Bayesian MethodBBўџ р…ŸђљOhЋ‘+'Гй0Œˆрьј , H T `lt|„'FForeseeing Small Company Delinquencies with the Na•ve Bayesian MethodworeBrereNormaliBrm15mMicrosoft Word 10.1@дѓЗ%@@“pqйЩ@~іˆ‰кЩ-бўџ еЭеœ.“—+,љЎ0 `hpx€ ˆ˜  Ј њ'2 ?ж FForeseeing Small Company Delinquencies with the Na•ve Bayesian Method Title ўџџџ !"#$%&'()ўџџџ+,-./01ўџџџ3456789ўџџџ§џџџ<ўџџџўџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџRoot Entryџџџџџџџџ РFЃл№NкЩ>€1TableџџџџџџџџчWordDocumentџџџџџџџџC<SummaryInformation(џџџџ*DocumentSummaryInformation8џџџџџџџџџџџџ2CompObjџџџџџџџџџџџџXџџџџџџџџџџџџџџџџџџџџџџџџўџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџџўџџџџџ РFMicrosoft Word DocumentўџџџNB6WWord.Document.8