diff options
Diffstat (limited to 'src')
33 files changed, 0 insertions, 2410 deletions
diff --git a/src/nbc/GPL-3 b/src/nbc/GPL-3 deleted file mode 100644 index 4432540..0000000 --- a/src/nbc/GPL-3 +++ /dev/null @@ -1,676 +0,0 @@ - - GNU GENERAL PUBLIC LICENSE - Version 3, 29 June 2007 - - Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The GNU General Public License is a free, copyleft license for -software and other kinds of works. - - The licenses for most software and other practical works are designed -to take away your freedom to share and change the works. By contrast, -the GNU General Public License is intended to guarantee your freedom to -share and change all versions of a program--to make sure it remains free -software for all its users. We, the Free Software Foundation, use the -GNU General Public License for most of our software; it applies also to -any other work released this way by its authors. You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -them if you wish), that you receive source code or can get it if you -want it, that you can change the software or use pieces of it in new -free programs, and that you know you can do these things. - - To protect your rights, we need to prevent others from denying you -these rights or asking you to surrender the rights. Therefore, you have -certain responsibilities if you distribute copies of the software, or if -you modify it: responsibilities to respect the freedom of others. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must pass on to the recipients the same -freedoms that you received. You must make sure that they, too, receive -or can get the source code. And you must show them these terms so they -know their rights. - - Developers that use the GNU GPL protect your rights with two steps: -(1) assert copyright on the software, and (2) offer you this License -giving you legal permission to copy, distribute and/or modify it. - - For the developers' and authors' protection, the GPL clearly explains -that there is no warranty for this free software. For both users' and -authors' sake, the GPL requires that modified versions be marked as -changed, so that their problems will not be attributed erroneously to -authors of previous versions. - - Some devices are designed to deny users access to install or run -modified versions of the software inside them, although the manufacturer -can do so. This is fundamentally incompatible with the aim of -protecting users' freedom to change the software. The systematic -pattern of such abuse occurs in the area of products for individuals to -use, which is precisely where it is most unacceptable. Therefore, we -have designed this version of the GPL to prohibit the practice for those -products. If such problems arise substantially in other domains, we -stand ready to extend this provision to those domains in future versions -of the GPL, as needed to protect the freedom of users. - - Finally, every program is threatened constantly by software patents. -States should not allow patents to restrict development and use of -software on general-purpose computers, but in those that do, we wish to -avoid the special danger that patents applied to a free program could -make it effectively proprietary. To prevent this, the GPL assures that -patents cannot be used to render the program non-free. - - The precise terms and conditions for copying, distribution and -modification follow. - - TERMS AND CONDITIONS - - 0. Definitions. - - "This License" refers to version 3 of the GNU General Public License. - - "Copyright" also means copyright-like laws that apply to other kinds of -works, such as semiconductor masks. - - "The Program" refers to any copyrightable work licensed under this -License. Each licensee is addressed as "you". "Licensees" and -"recipients" may be individuals or organizations. - - To "modify" a work means to copy from or adapt all or part of the work -in a fashion requiring copyright permission, other than the making of an -exact copy. The resulting work is called a "modified version" of the -earlier work or a work "based on" the earlier work. - - A "covered work" means either the unmodified Program or a work based -on the Program. - - To "propagate" a work means to do anything with it that, without -permission, would make you directly or secondarily liable for -infringement under applicable copyright law, except executing it on a -computer or modifying a private copy. Propagation includes copying, -distribution (with or without modification), making available to the -public, and in some countries other activities as well. - - To "convey" a work means any kind of propagation that enables other -parties to make or receive copies. Mere interaction with a user through -a computer network, with no transfer of a copy, is not conveying. - - An interactive user interface displays "Appropriate Legal Notices" -to the extent that it includes a convenient and prominently visible -feature that (1) displays an appropriate copyright notice, and (2) -tells the user that there is no warranty for the work (except to the -extent that warranties are provided), that licensees may convey the -work under this License, and how to view a copy of this License. If -the interface presents a list of user commands or options, such as a -menu, a prominent item in the list meets this criterion. - - 1. Source Code. - - The "source code" for a work means the preferred form of the work -for making modifications to it. "Object code" means any non-source -form of a work. - - A "Standard Interface" means an interface that either is an official -standard defined by a recognized standards body, or, in the case of -interfaces specified for a particular programming language, one that -is widely used among developers working in that language. - - The "System Libraries" of an executable work include anything, other -than the work as a whole, that (a) is included in the normal form of -packaging a Major Component, but which is not part of that Major -Component, and (b) serves only to enable use of the work with that -Major Component, or to implement a Standard Interface for which an -implementation is available to the public in source code form. A -"Major Component", in this context, means a major essential component -(kernel, window system, and so on) of the specific operating system -(if any) on which the executable work runs, or a compiler used to -produce the work, or an object code interpreter used to run it. - - The "Corresponding Source" for a work in object code form means all -the source code needed to generate, install, and (for an executable -work) run the object code and to modify the work, including scripts to -control those activities. However, it does not include the work's -System Libraries, or general-purpose tools or generally available free -programs which are used unmodified in performing those activities but -which are not part of the work. For example, Corresponding Source -includes interface definition files associated with source files for -the work, and the source code for shared libraries and dynamically -linked subprograms that the work is specifically designed to require, -such as by intimate data communication or control flow between those -subprograms and other parts of the work. - - The Corresponding Source need not include anything that users -can regenerate automatically from other parts of the Corresponding -Source. - - The Corresponding Source for a work in source code form is that -same work. - - 2. Basic Permissions. - - All rights granted under this License are granted for the term of -copyright on the Program, and are irrevocable provided the stated -conditions are met. This License explicitly affirms your unlimited -permission to run the unmodified Program. The output from running a -covered work is covered by this License only if the output, given its -content, constitutes a covered work. This License acknowledges your -rights of fair use or other equivalent, as provided by copyright law. - - You may make, run and propagate covered works that you do not -convey, without conditions so long as your license otherwise remains -in force. You may convey covered works to others for the sole purpose -of having them make modifications exclusively for you, or provide you -with facilities for running those works, provided that you comply with -the terms of this License in conveying all material for which you do -not control copyright. Those thus making or running the covered works -for you must do so exclusively on your behalf, under your direction -and control, on terms that prohibit them from making any copies of -your copyrighted material outside their relationship with you. - - Conveying under any other circumstances is permitted solely under -the conditions stated below. Sublicensing is not allowed; section 10 -makes it unnecessary. - - 3. Protecting Users' Legal Rights From Anti-Circumvention Law. - - No covered work shall be deemed part of an effective technological -measure under any applicable law fulfilling obligations under article -11 of the WIPO copyright treaty adopted on 20 December 1996, or -similar laws prohibiting or restricting circumvention of such -measures. - - When you convey a covered work, you waive any legal power to forbid -circumvention of technological measures to the extent such circumvention -is effected by exercising rights under this License with respect to -the covered work, and you disclaim any intention to limit operation or -modification of the work as a means of enforcing, against the work's -users, your or third parties' legal rights to forbid circumvention of -technological measures. - - 4. Conveying Verbatim Copies. - - You may convey verbatim copies of the Program's source code as you -receive it, in any medium, provided that you conspicuously and -appropriately publish on each copy an appropriate copyright notice; -keep intact all notices stating that this License and any -non-permissive terms added in accord with section 7 apply to the code; -keep intact all notices of the absence of any warranty; and give all -recipients a copy of this License along with the Program. - - You may charge any price or no price for each copy that you convey, -and you may offer support or warranty protection for a fee. - - 5. Conveying Modified Source Versions. - - You may convey a work based on the Program, or the modifications to -produce it from the Program, in the form of source code under the -terms of section 4, provided that you also meet all of these conditions: - - a) The work must carry prominent notices stating that you modified - it, and giving a relevant date. - - b) The work must carry prominent notices stating that it is - released under this License and any conditions added under section - 7. This requirement modifies the requirement in section 4 to - "keep intact all notices". - - c) You must license the entire work, as a whole, under this - License to anyone who comes into possession of a copy. This - License will therefore apply, along with any applicable section 7 - additional terms, to the whole of the work, and all its parts, - regardless of how they are packaged. This License gives no - permission to license the work in any other way, but it does not - invalidate such permission if you have separately received it. - - d) If the work has interactive user interfaces, each must display - Appropriate Legal Notices; however, if the Program has interactive - interfaces that do not display Appropriate Legal Notices, your - work need not make them do so. - - A compilation of a covered work with other separate and independent -works, which are not by their nature extensions of the covered work, -and which are not combined with it such as to form a larger program, -in or on a volume of a storage or distribution medium, is called an -"aggregate" if the compilation and its resulting copyright are not -used to limit the access or legal rights of the compilation's users -beyond what the individual works permit. Inclusion of a covered work -in an aggregate does not cause this License to apply to the other -parts of the aggregate. - - 6. Conveying Non-Source Forms. - - You may convey a covered work in object code form under the terms -of sections 4 and 5, provided that you also convey the -machine-readable Corresponding Source under the terms of this License, -in one of these ways: - - a) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by the - Corresponding Source fixed on a durable physical medium - customarily used for software interchange. - - b) Convey the object code in, or embodied in, a physical product - (including a physical distribution medium), accompanied by a - written offer, valid for at least three years and valid for as - long as you offer spare parts or customer support for that product - model, to give anyone who possesses the object code either (1) a - copy of the Corresponding Source for all the software in the - product that is covered by this License, on a durable physical - medium customarily used for software interchange, for a price no - more than your reasonable cost of physically performing this - conveying of source, or (2) access to copy the - Corresponding Source from a network server at no charge. - - c) Convey individual copies of the object code with a copy of the - written offer to provide the Corresponding Source. This - alternative is allowed only occasionally and noncommercially, and - only if you received the object code with such an offer, in accord - with subsection 6b. - - d) Convey the object code by offering access from a designated - place (gratis or for a charge), and offer equivalent access to the - Corresponding Source in the same way through the same place at no - further charge. You need not require recipients to copy the - Corresponding Source along with the object code. If the place to - copy the object code is a network server, the Corresponding Source - may be on a different server (operated by you or a third party) - that supports equivalent copying facilities, provided you maintain - clear directions next to the object code saying where to find the - Corresponding Source. Regardless of what server hosts the - Corresponding Source, you remain obligated to ensure that it is - available for as long as needed to satisfy these requirements. - - e) Convey the object code using peer-to-peer transmission, provided - you inform other peers where the object code and Corresponding - Source of the work are being offered to the general public at no - charge under subsection 6d. - - A separable portion of the object code, whose source code is excluded -from the Corresponding Source as a System Library, need not be -included in conveying the object code work. - - A "User Product" is either (1) a "consumer product", which means any -tangible personal property which is normally used for personal, family, -or household purposes, or (2) anything designed or sold for incorporation -into a dwelling. In determining whether a product is a consumer product, -doubtful cases shall be resolved in favor of coverage. For a particular -product received by a particular user, "normally used" refers to a -typical or common use of that class of product, regardless of the status -of the particular user or of the way in which the particular user -actually uses, or expects or is expected to use, the product. A product -is a consumer product regardless of whether the product has substantial -commercial, industrial or non-consumer uses, unless such uses represent -the only significant mode of use of the product. - - "Installation Information" for a User Product means any methods, -procedures, authorization keys, or other information required to install -and execute modified versions of a covered work in that User Product from -a modified version of its Corresponding Source. The information must -suffice to ensure that the continued functioning of the modified object -code is in no case prevented or interfered with solely because -modification has been made. - - If you convey an object code work under this section in, or with, or -specifically for use in, a User Product, and the conveying occurs as -part of a transaction in which the right of possession and use of the -User Product is transferred to the recipient in perpetuity or for a -fixed term (regardless of how the transaction is characterized), the -Corresponding Source conveyed under this section must be accompanied -by the Installation Information. But this requirement does not apply -if neither you nor any third party retains the ability to install -modified object code on the User Product (for example, the work has -been installed in ROM). - - The requirement to provide Installation Information does not include a -requirement to continue to provide support service, warranty, or updates -for a work that has been modified or installed by the recipient, or for -the User Product in which it has been modified or installed. Access to a -network may be denied when the modification itself materially and -adversely affects the operation of the network or violates the rules and -protocols for communication across the network. - - Corresponding Source conveyed, and Installation Information provided, -in accord with this section must be in a format that is publicly -documented (and with an implementation available to the public in -source code form), and must require no special password or key for -unpacking, reading or copying. - - 7. Additional Terms. - - "Additional permissions" are terms that supplement the terms of this -License by making exceptions from one or more of its conditions. -Additional permissions that are applicable to the entire Program shall -be treated as though they were included in this License, to the extent -that they are valid under applicable law. If additional permissions -apply only to part of the Program, that part may be used separately -under those permissions, but the entire Program remains governed by -this License without regard to the additional permissions. - - When you convey a copy of a covered work, you may at your option -remove any additional permissions from that copy, or from any part of -it. (Additional permissions may be written to require their own -removal in certain cases when you modify the work.) You may place -additional permissions on material, added by you to a covered work, -for which you have or can give appropriate copyright permission. - - Notwithstanding any other provision of this License, for material you -add to a covered work, you may (if authorized by the copyright holders of -that material) supplement the terms of this License with terms: - - a) Disclaiming warranty or limiting liability differently from the - terms of sections 15 and 16 of this License; or - - b) Requiring preservation of specified reasonable legal notices or - author attributions in that material or in the Appropriate Legal - Notices displayed by works containing it; or - - c) Prohibiting misrepresentation of the origin of that material, or - requiring that modified versions of such material be marked in - reasonable ways as different from the original version; or - - d) Limiting the use for publicity purposes of names of licensors or - authors of the material; or - - e) Declining to grant rights under trademark law for use of some - trade names, trademarks, or service marks; or - - f) Requiring indemnification of licensors and authors of that - material by anyone who conveys the material (or modified versions of - it) with contractual assumptions of liability to the recipient, for - any liability that these contractual assumptions directly impose on - those licensors and authors. - - All other non-permissive additional terms are considered "further -restrictions" within the meaning of section 10. If the Program as you -received it, or any part of it, contains a notice stating that it is -governed by this License along with a term that is a further -restriction, you may remove that term. If a license document contains -a further restriction but permits relicensing or conveying under this -License, you may add to a covered work material governed by the terms -of that license document, provided that the further restriction does -not survive such relicensing or conveying. - - If you add terms to a covered work in accord with this section, you -must place, in the relevant source files, a statement of the -additional terms that apply to those files, or a notice indicating -where to find the applicable terms. - - Additional terms, permissive or non-permissive, may be stated in the -form of a separately written license, or stated as exceptions; -the above requirements apply either way. - - 8. Termination. - - You may not propagate or modify a covered work except as expressly -provided under this License. Any attempt otherwise to propagate or -modify it is void, and will automatically terminate your rights under -this License (including any patent licenses granted under the third -paragraph of section 11). - - However, if you cease all violation of this License, then your -license from a particular copyright holder is reinstated (a) -provisionally, unless and until the copyright holder explicitly and -finally terminates your license, and (b) permanently, if the copyright -holder fails to notify you of the violation by some reasonable means -prior to 60 days after the cessation. - - Moreover, your license from a particular copyright holder is -reinstated permanently if the copyright holder notifies you of the -violation by some reasonable means, this is the first time you have -received notice of violation of this License (for any work) from that -copyright holder, and you cure the violation prior to 30 days after -your receipt of the notice. - - Termination of your rights under this section does not terminate the -licenses of parties who have received copies or rights from you under -this License. If your rights have been terminated and not permanently -reinstated, you do not qualify to receive new licenses for the same -material under section 10. - - 9. Acceptance Not Required for Having Copies. - - You are not required to accept this License in order to receive or -run a copy of the Program. Ancillary propagation of a covered work -occurring solely as a consequence of using peer-to-peer transmission -to receive a copy likewise does not require acceptance. However, -nothing other than this License grants you permission to propagate or -modify any covered work. These actions infringe copyright if you do -not accept this License. Therefore, by modifying or propagating a -covered work, you indicate your acceptance of this License to do so. - - 10. Automatic Licensing of Downstream Recipients. - - Each time you convey a covered work, the recipient automatically -receives a license from the original licensors, to run, modify and -propagate that work, subject to this License. You are not responsible -for enforcing compliance by third parties with this License. - - An "entity transaction" is a transaction transferring control of an -organization, or substantially all assets of one, or subdividing an -organization, or merging organizations. If propagation of a covered -work results from an entity transaction, each party to that -transaction who receives a copy of the work also receives whatever -licenses to the work the party's predecessor in interest had or could -give under the previous paragraph, plus a right to possession of the -Corresponding Source of the work from the predecessor in interest, if -the predecessor has it or can get it with reasonable efforts. - - You may not impose any further restrictions on the exercise of the -rights granted or affirmed under this License. For example, you may -not impose a license fee, royalty, or other charge for exercise of -rights granted under this License, and you may not initiate litigation -(including a cross-claim or counterclaim in a lawsuit) alleging that -any patent claim is infringed by making, using, selling, offering for -sale, or importing the Program or any portion of it. - - 11. Patents. - - A "contributor" is a copyright holder who authorizes use under this -License of the Program or a work on which the Program is based. The -work thus licensed is called the contributor's "contributor version". - - A contributor's "essential patent claims" are all patent claims -owned or controlled by the contributor, whether already acquired or -hereafter acquired, that would be infringed by some manner, permitted -by this License, of making, using, or selling its contributor version, -but do not include claims that would be infringed only as a -consequence of further modification of the contributor version. For -purposes of this definition, "control" includes the right to grant -patent sublicenses in a manner consistent with the requirements of -this License. - - Each contributor grants you a non-exclusive, worldwide, royalty-free -patent license under the contributor's essential patent claims, to -make, use, sell, offer for sale, import and otherwise run, modify and -propagate the contents of its contributor version. - - In the following three paragraphs, a "patent license" is any express -agreement or commitment, however denominated, not to enforce a patent -(such as an express permission to practice a patent or covenant not to -sue for patent infringement). To "grant" such a patent license to a -party means to make such an agreement or commitment not to enforce a -patent against the party. - - If you convey a covered work, knowingly relying on a patent license, -and the Corresponding Source of the work is not available for anyone -to copy, free of charge and under the terms of this License, through a -publicly available network server or other readily accessible means, -then you must either (1) cause the Corresponding Source to be so -available, or (2) arrange to deprive yourself of the benefit of the -patent license for this particular work, or (3) arrange, in a manner -consistent with the requirements of this License, to extend the patent -license to downstream recipients. "Knowingly relying" means you have -actual knowledge that, but for the patent license, your conveying the -covered work in a country, or your recipient's use of the covered work -in a country, would infringe one or more identifiable patents in that -country that you have reason to believe are valid. - - If, pursuant to or in connection with a single transaction or -arrangement, you convey, or propagate by procuring conveyance of, a -covered work, and grant a patent license to some of the parties -receiving the covered work authorizing them to use, propagate, modify -or convey a specific copy of the covered work, then the patent license -you grant is automatically extended to all recipients of the covered -work and works based on it. - - A patent license is "discriminatory" if it does not include within -the scope of its coverage, prohibits the exercise of, or is -conditioned on the non-exercise of one or more of the rights that are -specifically granted under this License. You may not convey a covered -work if you are a party to an arrangement with a third party that is -in the business of distributing software, under which you make payment -to the third party based on the extent of your activity of conveying -the work, and under which the third party grants, to any of the -parties who would receive the covered work from you, a discriminatory -patent license (a) in connection with copies of the covered work -conveyed by you (or copies made from those copies), or (b) primarily -for and in connection with specific products or compilations that -contain the covered work, unless you entered into that arrangement, -or that patent license was granted, prior to 28 March 2007. - - Nothing in this License shall be construed as excluding or limiting -any implied license or other defenses to infringement that may -otherwise be available to you under applicable patent law. - - 12. No Surrender of Others' Freedom. - - If conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot convey a -covered work so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you may -not convey it at all. For example, if you agree to terms that obligate you -to collect a royalty for further conveying from those to whom you convey -the Program, the only way you could satisfy both those terms and this -License would be to refrain entirely from conveying the Program. - - 13. Use with the GNU Affero General Public License. - - Notwithstanding any other provision of this License, you have -permission to link or combine any covered work with a work licensed -under version 3 of the GNU Affero General Public License into a single -combined work, and to convey the resulting work. The terms of this -License will continue to apply to the part which is the covered work, -but the special requirements of the GNU Affero General Public License, -section 13, concerning interaction through a network will apply to the -combination as such. - - 14. Revised Versions of this License. - - The Free Software Foundation may publish revised and/or new versions of -the GNU General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - - Each version is given a distinguishing version number. If the -Program specifies that a certain numbered version of the GNU General -Public License "or any later version" applies to it, you have the -option of following the terms and conditions either of that numbered -version or of any later version published by the Free Software -Foundation. If the Program does not specify a version number of the -GNU General Public License, you may choose any version ever published -by the Free Software Foundation. - - If the Program specifies that a proxy can decide which future -versions of the GNU General Public License can be used, that proxy's -public statement of acceptance of a version permanently authorizes you -to choose that version for the Program. - - Later license versions may give you additional or different -permissions. However, no additional obligations are imposed on any -author or copyright holder as a result of your choosing to follow a -later version. - - 15. Disclaimer of Warranty. - - THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY -APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT -HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY -OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, -THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR -PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM -IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF -ALL NECESSARY SERVICING, REPAIR OR CORRECTION. - - 16. Limitation of Liability. - - IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS -THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY -GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE -USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF -DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD -PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), -EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF -SUCH DAMAGES. - - 17. Interpretation of Sections 15 and 16. - - If the disclaimer of warranty and limitation of liability provided -above cannot be given local legal effect according to their terms, -reviewing courts shall apply local law that most closely approximates -an absolute waiver of all civil liability in connection with the -Program, unless a warranty or assumption of liability accompanies a -copy of the Program in return for a fee. - - END OF TERMS AND CONDITIONS - - How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -state the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - <one line to give the program's name and a brief idea of what it does.> - Copyright (C) <year> <name of author> - - This program is free software: you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation, either version 3 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program. If not, see <http://www.gnu.org/licenses/>. - -Also add information on how to contact you by electronic and paper mail. - - If the program does terminal interaction, make it output a short -notice like this when it starts in an interactive mode: - - <program> Copyright (C) <year> <name of author> - This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, your program's commands -might be different; for a GUI interface, you would use an "about box". - - You should also get your employer (if you work as a programmer) or school, -if any, to sign a "copyright disclaimer" for the program, if necessary. -For more information on this, and how to apply and follow the GNU GPL, see -<http://www.gnu.org/licenses/>. - - The GNU General Public License does not permit incorporating your program -into proprietary programs. If your program is a subroutine library, you -may consider it more useful to permit linking proprietary applications with -the library. If this is what you want to do, use the GNU Lesser General -Public License instead of this License. But first, please read -<http://www.gnu.org/philosophy/why-not-lgpl.html>. - diff --git a/src/nbc/NEWS b/src/nbc/NEWS deleted file mode 100644 index 501c1bb..0000000 --- a/src/nbc/NEWS +++ /dev/null @@ -1,3 +0,0 @@ -score and tabulate have been rewritten in Standard ML, to take advantage -of certain optimizations that the MLton compiler performs, and improve -performance. Their behavior should be unchanged, except for some speedup. diff --git a/src/nbc/binary.sml b/src/nbc/binary.sml deleted file mode 100644 index af5acd5..0000000 --- a/src/nbc/binary.sml +++ /dev/null @@ -1,26 +0,0 @@ -signature BINARY = sig - val fromInt32: int -> Word8Vector.vector - val fromInt16: int -> Word8Vector.vector - val fromReal: real -> Word8Vector.vector -end - -structure Binary :> BINARY = struct - val word8VectorFromArray = Word8ArraySlice.vector o Word8ArraySlice.full - fun fromInt32 i = - let - val array = Word8Array.array (PackWord32Little.bytesPerElem, 0w0) - val word = LargeWord.fromInt i - in - PackWord32Little.update (array, 0, word) - ; word8VectorFromArray array - end - fun fromInt16 i = - let - val array = Word8Array.array (PackWord16Little.bytesPerElem, 0w0) - val word = LargeWord.fromInt i - in - PackWord16Little.update (array, 0, word) - ; word8VectorFromArray array - end - val fromReal = PackRealLittle.toBytes -end diff --git a/src/nbc/build.sh b/src/nbc/build.sh deleted file mode 100644 index 3328271..0000000 --- a/src/nbc/build.sh +++ /dev/null @@ -1,4 +0,0 @@ -#! /bin/sh -v -mllex substitution.lex -mlyacc substitution.grm -mlton -link-opt -lJudy -link-opt -lz score.mlb gzip.c diff --git a/src/nbc/clean.sh b/src/nbc/clean.sh deleted file mode 100644 index 58d95bf..0000000 --- a/src/nbc/clean.sh +++ /dev/null @@ -1,2 +0,0 @@ -#! /bin/sh -v -rm substitution.lex.sml substitution.grm.sig substitution.grm.sml score diff --git a/src/nbc/count.ml b/src/nbc/count.ml deleted file mode 100644 index bd97b7b..0000000 --- a/src/nbc/count.ml +++ /dev/null @@ -1,60 +0,0 @@ -module ExtArray = ExtArray.Array -let (|>) a b = b a - -module Options = struct - let parser = OptParse.OptParser.make ~version:"1.0" () - let per_word = - let option = OptParse.StdOpt.str_option ~metavar:"file" () in - OptParse.OptParser.add parser ~short_name:'w' ~long_name:"per-word" - ~help:"file to store per-word counts in" option; - (fun () -> match OptParse.Opt.opt option with - Some x -> x - | None -> - OptParse.OptParser.usage parser (); - exit 1 - ) - let total = - let option = OptParse.StdOpt.str_option ~metavar:"file" () in - OptParse.OptParser.add parser ~short_name:'t' ~long_name:"total" - ~help:"file to store total count in" option; - (fun () -> match OptParse.Opt.opt option with - Some x -> x - | None -> - OptParse.OptParser.usage parser (); - exit 1 - ) - let order = - let option = OptParse.StdOpt.int_option ~default:15 () in - OptParse.OptParser.add parser ~short_name:'r' ~long_name:"order" option; - (fun () -> OptParse.Opt.get option) - let files = OptParse.OptParser.parse_argv parser - let total = total () - let per_word = per_word () - let order = order () -end - -let load_file order judy total name = - Misc.io_of_gzip name |> Fasta.enum_words order - |> Enum.fold (fun word total -> - Judy.bump judy word; - Judy.bump judy (Gene.reverse word); - total + 2 - ) total -let load_files order names = - let judy = Judy.create () in - List.fold_left (load_file order judy) 0 names, judy -let gzip_output_string c s = Gzip.output c s 0 (String.length s) -let () = - let total_words, judy = load_files Options.order Options.files in - ( - let c = open_out Options.total in - output_string c (string_of_int total_words ^ "\n"); - close_out c - ); ( - let c = Gzip.open_out Options.per_word in - Judy.iter (fun word count -> - gzip_output_string c - (String.concat "" [string_of_int count; " "; word; "\n"]) - ) judy; - Gzip.close_out c - ) diff --git a/src/nbc/fail.sml b/src/nbc/fail.sml deleted file mode 100644 index f3d324c..0000000 --- a/src/nbc/fail.sml +++ /dev/null @@ -1,10 +0,0 @@ -signature FAIL = sig - val fail: string -> 'a -end - -structure Fail :> FAIL = struct - fun fail why = ( - TextIO.output (TextIO.stdErr, why ^ "\n") - ; OS.Process.exit OS.Process.failure - ) -end diff --git a/src/nbc/fasta-test.mlb b/src/nbc/fasta-test.mlb deleted file mode 100644 index 564fd11..0000000 --- a/src/nbc/fasta-test.mlb +++ /dev/null @@ -1,9 +0,0 @@ -local - local - fasta-all.mlb - in - functor Test - end -in - test-generate.sml -end diff --git a/src/nbc/gene.mlb b/src/nbc/gene.mlb deleted file mode 100644 index 47261ff..0000000 --- a/src/nbc/gene.mlb +++ /dev/null @@ -1,5 +0,0 @@ -local - $(SML_LIB)/basis/basis.mlb -in - gene.sml -end diff --git a/src/nbc/gene.sml b/src/nbc/gene.sml deleted file mode 100644 index 738e3fe..0000000 --- a/src/nbc/gene.sml +++ /dev/null @@ -1,49 +0,0 @@ -signature GENE = sig - val reverse: string -> string - val first: int -> string - val next: string -> string option -end - -structure Gene :> GENE = struct - fun reverse s = - let - val n = size s - val m = n - 1 - fun opposite c = case c of - #"A" => #"T" - | #"T" => #"A" - | #"C" => #"G" - | #"G" => #"C" - | _ => c - in - CharVector.tabulate (n, fn i => opposite (String.sub (s, m - i))) - end - fun first order = CharVector.tabulate (order, fn _ => #"A") - fun next nmer = - let - val order = size nmer - fun finish (rightmostNonT, replacement) = CharVector.tabulate ( - order - , fn index => - case - Int.compare ( - index - , rightmostNonT - ) - of - LESS => String.sub (nmer, index) - | EQUAL => replacement - | GREATER => #"A" - ) - fun continue index = - if index < 0 then NONE - else case String.sub (nmer, index) of - #"A" => SOME (finish (index, #"C")) - | #"C" => SOME (finish (index, #"G")) - | #"G" => SOME (finish (index, #"T")) - | #"T" => continue (index - 1) - | _ => raise Fail "Invalid base" - in - continue (size nmer - 1) - end -end diff --git a/src/nbc/genome.sml b/src/nbc/genome.sml deleted file mode 100644 index 81f3183..0000000 --- a/src/nbc/genome.sml +++ /dev/null @@ -1,33 +0,0 @@ -signature GENOME = sig - exception Bad - type t - val load: string * int -> t - val get: t * string -> int option -end - -structure Genome :> GENOME = struct - exception Bad - fun |> (x, f) = f x - infix |> - - type t = (string, int) HashTable.hash_table - fun load (gname, order) = - let - val h = HashTable.mkTable - (HashString.hashString, op =) - (1024 * 1024, Fail "") - in - Options.genomeText (order, gname) |> Gzip.openIn |> Misc.sequenceLines - |> Sequence.map (fn s => ( - case Misc.split2 s of - SOME (count, nmer) => ( - nmer - , case Int.fromString count of - NONE => raise Bad - | SOME x => x - ) | NONE => raise Bad - )) |> Sequence.app (HashTable.insert h) - ; h - end - fun get (h, nmer) = HashTable.find h nmer -end diff --git a/src/nbc/gzip.c b/src/nbc/gzip.c deleted file mode 100644 index d2bf98f..0000000 --- a/src/nbc/gzip.c +++ /dev/null @@ -1,13 +0,0 @@ -#include <zlib.h> - -int -gzreadoffset(gzFile file, voidp buf, unsigned offset, unsigned len) -{ - return gzread(file, buf + offset, len); -} - -int -gzwriteoffset(gzFile file, voidpc buf, unsigned offset, unsigned len) -{ - return gzwrite(file, buf + offset, len); -} diff --git a/src/nbc/gzip.sml b/src/nbc/gzip.sml deleted file mode 100644 index fc6e5ff..0000000 --- a/src/nbc/gzip.sml +++ /dev/null @@ -1,164 +0,0 @@ -signature GZIP = sig - exception Failure - val openIn: string -> TextIO.instream - val openOut: string -> TextIO.outstream -end - -structure Gzip :> GZIP = struct - structure Primitive :> sig - eqtype gzFile - val null: gzFile - val gzopen: string * string -> gzFile - val gzread: gzFile * CharArray.array * word * word -> int - val gzwritea: gzFile * CharArray.array * word * word -> int - val gzwritev: gzFile * CharVector.vector * word * word -> int - val gzclose: gzFile -> int - end = struct - type gzFile = MLton.Pointer.t - val null = MLton.Pointer.null - val gzopen = _import "gzopen": string * string -> gzFile; - val gzread = _import "gzreadoffset": - gzFile * CharArray.array * word * word -> int; - val gzwritea = _import "gzwriteoffset": - gzFile * CharArray.array * word * word -> int; - val gzwritev = _import "gzwriteoffset": - gzFile * CharVector.vector * word * word -> int; - val gzclose = _import "gzclose": gzFile -> int; - end - - exception Failure - fun readArraySlice (g, slice) = - let - val (array, offset, length) = CharArraySlice.base slice - val wordoffset = Word.fromInt offset - val wordlength = Word.fromInt length - in - Primitive.gzread (g, array, wordoffset, wordlength) - end - fun readerFromPrimitive (name, g) = - let - val closed = ref false - fun error x = raise IO.Io { - name = name - , function = "readArr" - , cause = x - } - fun readArr slice = - if !closed then error IO.ClosedStream - else let - val r = readArraySlice (g, slice) - in - if r < 0 then error Failure - else r - end - fun close () = - if !closed then () - else ( - closed := true - ; ignore (Primitive.gzclose g) - ) - in - TextPrimIO.augmentReader (TextPrimIO.RD { - name = name - , chunkSize = 32 * 1024 - , readVec = NONE - , readArr = SOME readArr - , readVecNB = NONE - , readArrNB = NONE - , block = NONE - , canInput = NONE - , avail = fn () => NONE - , getPos = NONE - , setPos = NONE - , endPos = NONE - , verifyPos = NONE - , close = close - , ioDesc = NONE - }) - end - fun readerFromName name = - let - val path = name ^ "\000" - val g = Primitive.gzopen (path, "r\000") - in - if g = Primitive.null then NONE - else SOME (readerFromPrimitive (name, g)) - end - fun openIn name = - case readerFromName name of - SOME x => TextIO.mkInstream (TextIO.StreamIO.mkInstream (x, "")) - | NONE => raise IO.Io { - name = name - , function = "openIn" - , cause = Failure - } - fun writeSlice (base, write) (g, slice) = - let - val (vector, offset, length) = base slice - val wordoffset = Word.fromInt offset - val wordlength = Word.fromInt length - in - write (g, vector, wordoffset, wordlength) - end - val writeVectorSlice = writeSlice (CharVectorSlice.base, Primitive.gzwritev) - val writeArraySlice = writeSlice (CharArraySlice.base, Primitive.gzwritea) - fun writerFromPrimitive (name, g) = - let - val closed = ref false - fun error (function, x) = raise IO.Io { - name = name - , function = function - , cause = x - } - fun close () = - if !closed then () - else ( - closed := true - ; ignore (Primitive.gzclose g) - ) - fun write (name, realWrite) slice = - if !closed then error (name, IO.ClosedStream) - else let - val r = realWrite (g, slice) - in - if r <= 0 then error (name, Failure) - else r - end - val writeVec = write ("writeVec", writeVectorSlice) - val writeArr = write ("writeArr", writeArraySlice) - in - TextPrimIO.augmentWriter (TextPrimIO.WR { - name = name - , chunkSize = 32 * 1024 - , writeVec = SOME writeVec - , writeArr = SOME writeArr - , writeVecNB = NONE - , writeArrNB = NONE - , block = NONE - , canOutput = NONE - , getPos = NONE - , setPos = NONE - , endPos = NONE - , verifyPos = NONE - , close = close - , ioDesc = NONE - }) - end - fun writerFromName name = - let - val path = name ^ "\000" - val g = Primitive.gzopen (path, "w9") - in - if g = Primitive.null then NONE - else SOME (writerFromPrimitive (name, g)) - end - fun openOut name = - case writerFromName name of - SOME x => TextIO.mkOutstream (TextIO.StreamIO.mkOutstream (x, IO.NO_BUF)) - | NONE => raise IO.Io { - name = name - , function = "openOut" - , cause = Failure - } -end - diff --git a/src/nbc/history.mlb b/src/nbc/history.mlb deleted file mode 100644 index e7103cc..0000000 --- a/src/nbc/history.mlb +++ /dev/null @@ -1,5 +0,0 @@ -local - $(SML_LIB)/basis/basis.mlb -in - history.sml -end diff --git a/src/nbc/history.sml b/src/nbc/history.sml deleted file mode 100644 index 52c1bb2..0000000 --- a/src/nbc/history.sml +++ /dev/null @@ -1,74 +0,0 @@ -structure History :> sig - type history - val create: int -> history - val clear: history -> history - val push: history * char -> string option * history -end = struct - type history = {maximumSize: int, content: string} - fun create maximumSize = {maximumSize = maximumSize, content = ""} - fun clear {maximumSize, content = _} = - {maximumSize = maximumSize, content = ""} - fun addToString (string, newCharacter) = - let - val size = size string - in - CharVector.tabulate ( - size + 1 - , fn index => - if index = size then newCharacter - else String.sub (string, index) - ) - end - fun shiftIntoString (string, newCharacter) = - let - val size = size string - in - CharVector.tabulate ( - size - , fn index => - if index = size - 1 then newCharacter - else String.sub (string, index + 1) - ) - end - fun push ({maximumSize, content}, newCharacter) = - let - val currentSize = size content - in - if currentSize = maximumSize then - let - val newContent = shiftIntoString ( - content - , newCharacter - ) - in ( - SOME newContent - , { - maximumSize = maximumSize - , content = newContent - } - ) end - else if currentSize = maximumSize - 1 then - let - val newContent = addToString ( - content - , newCharacter - ) - in ( - SOME newContent - , { - maximumSize = maximumSize - , content = newContent - } - ) end - else ( - NONE - , { - maximumSize = maximumSize - , content = addToString ( - content - , newCharacter - ) - } - ) - end -end diff --git a/src/nbc/judy.sml b/src/nbc/judy.sml deleted file mode 100644 index 73e257b..0000000 --- a/src/nbc/judy.sml +++ /dev/null @@ -1,175 +0,0 @@ -signature JUDY = sig - exception OutOfMemory - type t - val create: unit -> t - val insert: t * string * int -> unit - val get: t * string -> int option - val bump: t * string -> unit - val first: t -> (string * int) option - val next: t * string -> (string * int) option - val sequence: t -> (string * int) Sequence.t - val app: (string * int -> unit) -> t -> unit -end -structure Judy :> JUDY = struct - structure Primitive :> sig - type judy - type errorDetail - type return - val judyNull: judy - val errorDetailNull: errorDetail - val get: judy * string * errorDetail -> return - val insert: judy ref * string * errorDetail -> return - val delete: judy ref * string * errorDetail -> int - val free: judy ref * errorDetail -> word - val first: judy * CharArray.array * errorDetail -> return - val next: judy * CharArray.array * errorDetail -> return - val returnIsError: return -> bool - val returnIsNull: return -> bool - val returnGet: return -> int - val returnSet: return * int -> unit - end = struct - type judy = MLton.Pointer.t - type errorDetail = MLton.Pointer.t - type return = MLton.Pointer.t - val judyNull = MLton.Pointer.null - val errorDetailNull = MLton.Pointer.null - val get = _import "JudySLGet": judy * string * errorDetail -> return; - val insert = _import "JudySLIns": judy ref * string * errorDetail -> return; - val delete = _import "JudySLDel": judy ref * string * errorDetail -> int; - val free = _import "JudySLFreeArray": judy ref * errorDetail -> word; - val first = _import "JudySLFirst": judy * CharArray.array * errorDetail -> return; - val next = _import "JudySLNext": judy * CharArray.array * errorDetail -> return; - local - val pjerr = MLton.Pointer.sub (MLton.Pointer.null, 0w1) - in - fun returnIsError return = return = pjerr - end - fun returnIsNull return = return = MLton.Pointer.null - fun returnGet return = Int32.toInt (MLton.Pointer.getInt32 (return, 0)) - fun returnSet (return, i) = MLton.Pointer.setInt32 (return, 0, Int32.fromInt i) - end - exception OutOfMemory - type t = {judy: Primitive.judy ref, max: int ref} - fun create () = {judy = ref Primitive.judyNull, max = ref 0} - fun insert ({judy, max}, key, value) = - let - val return = Primitive.insert ( - judy - , key ^ "\000" - , Primitive.errorDetailNull - ) - in - if Primitive.returnIsError return then raise OutOfMemory - else let - val n = size key - in - if !max < n then max := n else () - ; Primitive.returnSet (return, value) - end - end - fun get ({judy, max = _}, key) = - let - val return = Primitive.get ( - !judy - , key ^ "\000" - , Primitive.errorDetailNull - ) - in - if Primitive.returnIsNull return then NONE - else SOME (Primitive.returnGet return) - end - fun bump ({judy, max}, key) = - let - val return = Primitive.insert ( - judy - , key ^ "\000" - , Primitive.errorDetailNull - ) - in - if Primitive.returnIsError return then raise OutOfMemory - else let - val n = size key - in - if !max < n then max := n else () - ; Primitive.returnSet ( - return - , Primitive.returnGet return + 1 - ) - end - end - fun strlen array = - case CharArray.findi (fn (_, c) => c = #"\000") array of - NONE => raise Option - | SOME (i, _) => i - fun stringFromNullTerminatedArray array = - CharArraySlice.vector ( - CharArraySlice.slice (array, 0, SOME (strlen array)) - ) - fun first {judy, max} = - let - val array = CharArray.array (!max + 1, #"\000") - val return = Primitive.first ( - !judy - , array - , Primitive.errorDetailNull - ) - in - if Primitive.returnIsNull return then NONE - else SOME ( - stringFromNullTerminatedArray array - , Primitive.returnGet return - ) - end - fun next ({judy, max}, key) = - let - val size = size key - val array = CharArray.tabulate ( - !max + 1 - , fn i => - if i < size then String.sub (key, i) - else #"\000" - ) - val return = Primitive.next ( - !judy - , array - , Primitive.errorDetailNull - ) - in - if Primitive.returnIsNull return then NONE - else SOME ( - stringFromNullTerminatedArray array - , Primitive.returnGet return - ) - end - fun sequence t = - let - val last = ref NONE - fun get () = - case ( - case !last of - NONE => first t - | SOME key => next (t, key) - ) of - NONE => NONE - | SOME (return as (key, _)) => ( - last := SOME key - ; SOME return - ) - in - Sequence.from get - end - fun app f t = - let - fun apply (key, value) = ( - f (key, value) - ; fetch key - ) and fetch key = - case next (t, key) of - NONE => () - | SOME x => apply x - in - case first t of - NONE => () - | SOME x => apply x - end -end diff --git a/src/nbc/kahan.sml b/src/nbc/kahan.sml deleted file mode 100644 index 70c6b47..0000000 --- a/src/nbc/kahan.sml +++ /dev/null @@ -1,31 +0,0 @@ -(* Kahan summation *) - -signature KAHAN = sig - type t - val zero: t - val add: t * real -> t - val sum: t -> real - val sequence: real Sequence.t -> real - val list: real list -> real - val array: real array -> real -end - -structure Kahan :> KAHAN = struct - type t = real * real - val zero = (0.0, 0.0) - fun add ((s, c), x) = - let - val y = x - c - val t = s + y - in - (t, t - s - y) - end - fun sum (s, c) = s - local - fun swappedAdd (a, b) = add (b, a) - in - fun sequence e = sum (Sequence.fold swappedAdd zero e) - fun list l = sum (foldl swappedAdd zero l) - fun array a = sum (Array.foldl swappedAdd zero a) - end -end diff --git a/src/nbc/main.sml b/src/nbc/main.sml deleted file mode 100644 index 361d9ca..0000000 --- a/src/nbc/main.sml +++ /dev/null @@ -1,98 +0,0 @@ -structure Main = struct - -fun |> (x, f) = f x -infix |> - -val order = Options.order -val genomes = case Options.genomes of - SOME x => x |> TextIO.openIn |> Misc.sequenceLines - | NONE => Options.genomesDir |> Misc.sortedDirectoryNoPrefix |> Sequence.fromList -fun input () = - case Fasta.sequence (Gzip.openIn Options.input) of - NONE => Fail.fail "input file is not FASTA format" - | SOME x => - Sequence.map (fn (header, data) => - (header, String.map Char.toUpper data) - ) x -fun output genome = - let - fun inner format = case format of - Options.Matlab {variable, file} => - let - val matlab = Matlab.openOut file - val doubleArray = Matlab.beginDoubleArray (matlab, variable) - in { - write = fn (_, score) => - Matlab.writeDouble (doubleArray, score) - , close = fn () => ( - Matlab.concludeDoubleArray doubleArray - ; Matlab.closeOut matlab - ) - } end - | Options.Text filename => - let - val gzip = Gzip.openOut filename - in { - write = fn (header, score) => - TextIO.output ( - gzip - , Real.fmt (StringCvt.FIX (SOME 8)) score - ) - , close = fn () => TextIO.closeOut gzip - } end - | Options.Dual {text, matlab} => - let - val text = inner (Options.Text text) - val matlab = inner (Options.Matlab matlab) - in { - write = fn x => ( - #write text x - ; #write matlab x - ), close = fn () => ( - #close text () - ; #close matlab () - ) - } end - in - inner (Options.output genome) - end -fun totalWords (genome, order) = - let - val name = Options.totalWords (genome, order) - val input = TextIO.openIn name - in - (case Option.mapPartial Real.fromString (Misc.inputLine input) of - NONE => Fail.fail ("could not read number from " ^ name) - | SOME r => r - ) before TextIO.closeIn input - end - -val () = - Sequence.app (fn gname => - let - val {write, close} = output gname - val totalWords = totalWords (gname, order) - val genome = ( - Stopwatch.start ("Loading genome " ^ gname) - ; Genome.load (gname, order) - before Stopwatch.finish () - ) - in - Stopwatch.start ("Scoring fragments") - ; input () |> Sequence.app (fn (header, fragment) => - write ( - header - , Score.score ( - order - , Options.missConstant - , fn nmer => Genome.get (genome, nmer) - , totalWords - , fragment - ) - ) - ); Stopwatch.finish () - ; close () - end - ) genomes - -end diff --git a/src/nbc/matlab.sml b/src/nbc/matlab.sml deleted file mode 100644 index 65f532d..0000000 --- a/src/nbc/matlab.sml +++ /dev/null @@ -1,135 +0,0 @@ -signature MATLAB = sig - type t - val openOut: string -> t - type doubleArray - val beginDoubleArray: t * string -> doubleArray - val writeDouble: doubleArray * real -> unit - val concludeDoubleArray: doubleArray -> unit - val writeDoubleArray: t * string * real vector -> unit - val closeOut: t -> unit -end - -structure Matlab :> MATLAB = struct - fun outputReal (c, f) = BinIO.output (c, Binary.fromReal f) - fun outputInt32 (c, i) = BinIO.output (c, Binary.fromInt32 i) - fun outputInt16 (c, i) = BinIO.output (c, Binary.fromInt16 i) - type t = BinIO.outstream - type doubleArray = { - outstream: BinIO.outstream - , wholeSizePos: BinIO.StreamIO.out_pos - , paddedSize: int - , arraySizePos: BinIO.StreamIO.out_pos - , dataSizePos: BinIO.StreamIO.out_pos - , elements: int ref - } - fun pad n = - if n > 0 andalso n <= 4 then 4 - else (n + 7) div 8 * 8 - fun doubleArraySize (nameLength, arrayLength) = - 32 + nameLength + 4 + 8 + arrayLength * 8 - val s8Tag = 1 - val s32Tag = 5 - val u32Tag = 6 - val doubleTag = 9 - val doubleFlag = 6 - val matrixTag = 14 - fun lsl (x, y) = Word.toIntX (Word.<< (Word.fromInt x, Word.fromInt y)) - fun beginDoubleArray (c, name) = - let - val nameSize = Int.min (size name, 63) - val name = Word8VectorSlice.vector ( - Word8VectorSlice.slice (Byte.stringToBytes name, 0, SOME nameSize) - ) - val paddedSize = pad nameSize - val padding = Word8Vector.tabulate (paddedSize - nameSize, fn _ => 0w0) - val () = outputInt32 (c, matrixTag) - val wholeSizePos = BinIO.getPosOut c - val () = ( - outputInt32 (c, 0) - ; outputInt32 (c, u32Tag) - ; outputInt32 (c, 8) - ; outputInt32 (c, doubleFlag) - ; outputInt32 (c, 0) - ; outputInt32 (c, s32Tag) - ; outputInt32 (c, 8) - ; outputInt32 (c, 1) - ) - val arraySizePos = BinIO.getPosOut c - val () = ( - outputInt32 (c, 0) - ; if nameSize > 0 andalso nameSize <= 4 then - outputInt32 (c, lsl (nameSize, 16) + s8Tag) - else ( - outputInt32 (c, s8Tag) - ; outputInt32 (c, nameSize) - ); BinIO.output (c, name) - ; BinIO.output (c, padding) - ; outputInt32 (c, doubleTag) - ) - val dataSizePos = BinIO.getPosOut c - val () = outputInt32 (c, 0) - in - { - outstream = c - , wholeSizePos = wholeSizePos - , paddedSize = paddedSize - , arraySizePos = arraySizePos - , dataSizePos = dataSizePos - , elements = ref 0 - } - end - fun writeDouble (da, f) = ( - outputReal (#outstream da, f) - ; #elements da := !(#elements da) + 1 - ) - fun concludeDoubleArray da = - let - val saved = BinIO.getPosOut (#outstream da) - in - BinIO.setPosOut (#outstream da, #wholeSizePos da) - ; outputInt32 ( - #outstream da - , doubleArraySize (#paddedSize da, !(#elements da)) - ); BinIO.setPosOut (#outstream da, #arraySizePos da) - ; outputInt32 (#outstream da, !(#elements da)) - ; BinIO.setPosOut (#outstream da, #dataSizePos da) - ; outputInt32 (#outstream da, (!(#elements da) * 8)) - ; BinIO.setPosOut (#outstream da, saved) - end - fun writeDoubleArray (c, name, array) = - let - val da = beginDoubleArray (c, name) - in - Vector.app (fn x => writeDouble (da, x)) array - ; concludeDoubleArray da - end - fun writeHeader (c, software, version) = - let - val date = Date.fromTimeUniv (Time.now ()) - val text = concat [ - "MATLAB 5.0 MAT-file, written by " - , software, " ", version, ", " - , Date.fmt "%Y-%m-$d %H:%M:%S UTC" date - ] - val size = size text - val header = - CharVector.tabulate ( - 124 - , fn i => - if i < size then String.sub (text, i) - else #" " - ) - in - BinIO.output (c, Byte.stringToBytes header) - ; outputInt16 (c, 0x0100) - ; outputInt16 (c, 0x4d49) - end - fun openOut name = - let - val c = BinIO.openOut name - in - writeHeader (c, Program.name, Program.version) - ; c - end - val closeOut = BinIO.closeOut -end diff --git a/src/nbc/misc.sml b/src/nbc/misc.sml deleted file mode 100644 index 4790bbc..0000000 --- a/src/nbc/misc.sml +++ /dev/null @@ -1,125 +0,0 @@ -signature MISC = sig - val inputLine: TextIO.instream -> string option - val sequenceFromRead: (TextIO.instream -> 'a option) -> TextIO.instream -> 'a Sequence.t - val sequenceLines: TextIO.instream -> string Sequence.t - val sortedDirectoryNoPrefix: string -> string list - val sortedDirectory: string -> string list - val substitute: (string * string) list -> string -> string option - val basenameWithoutExtension: string -> string - val split2: string -> (string * string) option - val longestCommonSubstring: string list -> string -end - -structure Misc :> MISC = struct - fun |> (x, f) = f x - infix |> - fun \ f x y = f (x, y) - fun sequenceFromRead f ioc = Sequence.from (fn () => f ioc) - fun inputLine instream = case TextIO.inputLine instream of - NONE => NONE - | SOME x => SOME (String.substring (x, 0, size x - 1)) - val sequenceLines = sequenceFromRead inputLine - fun sortedDirectoryNoPrefix d = - let - val h = OS.FileSys.openDir d - fun loop l = - case OS.FileSys.readDir h of - NONE => ( - OS.FileSys.closeDir h - ; ListMergeSort.sort (op >) l - ) | SOME n => loop ( - if String.sub (n, 0) = #"." then l - else n :: l - ) - in - loop nil - end - fun sortedDirectory d = - map (\OS.Path.concat d) (sortedDirectoryNoPrefix d) - fun assoc list key = - case - List.find (fn (possibility, _) => possibility = key) list - of - NONE => NONE - | SOME (_, answer) => SOME answer - local - exception NotFound - in - fun substitute v s = - Substitution.substitute - (fn s => (case assoc v s of - NONE => raise NotFound - | SOME x => x - )) s - handle NotFound => NONE - end - fun basenameWithoutExtension s = OS.Path.base (OS.Path.file s) - local - fun index f (string, offset) = - let - val last = size string - fun loop i = - if i = last then ~1 - else if f (String.sub (string, i)) then i - else loop (i + 1) - in - loop offset - end - in - val indexWhitespace = index Char.isSpace - val indexNonWhitespace = index (not o Char.isSpace) - end - fun split2 s = - let - val f1b = indexNonWhitespace (s, 0) - in - if f1b = ~1 then NONE - else let - val f1e = indexWhitespace (s, f1b + 1) - in - if f1e = ~1 then NONE - else let - val f2b = indexNonWhitespace (s, f1e + 1) - in - if f2b = ~1 then NONE - else let - val f2e = indexWhitespace (s, f2b + 1) - in - if f2e = ~1 then SOME ( - substring (s, f1b, f1e - f1b) - , substring (s, f2b, size s - f2b) - ) else if indexNonWhitespace (s, f2e + 1) = ~1 then - SOME ( - substring (s, f1b, f1e - f1b) - , substring (s, f2b, f2e - f2b) - ) - else NONE - end - end - end - end - fun longestCommonSubstring strings = - case - foldl (fn (a, b) => (case b of - NONE => SOME a - | SOME b => if size a < size b then SOME a else SOME b - )) NONE strings - of - NONE => "" - | SOME shortest => let - val minimumSize = size shortest - fun loop (size, offset) = - let - fun next () = - if size + offset = minimumSize then - loop (size - 1, 0) - else loop (size, offset + 1) - val sub = substring (shortest, offset, size) - in - if List.all (String.isSubstring sub) strings then sub - else next () - end - in - loop (minimumSize, 0) - end -end diff --git a/src/nbc/nmer-test.mlb b/src/nbc/nmer-test.mlb deleted file mode 100644 index c363f4a..0000000 --- a/src/nbc/nmer-test.mlb +++ /dev/null @@ -1,9 +0,0 @@ -local - local - nmer-all.mlb - in - functor Test - end -in - test-generate.sml -end diff --git a/src/nbc/options.sml b/src/nbc/options.sml deleted file mode 100644 index 907c95a..0000000 --- a/src/nbc/options.sml +++ /dev/null @@ -1,202 +0,0 @@ -signature OPTIONS = sig - val order: int - val input: string - val genomesDir: string - val genomes: string option - val genomeText: int * string -> string - val totalWords: string * int -> string - val missConstant: real - datatype output = - Matlab of {variable: string, file: string} - | Text of string - | Dual of {text: string, matlab: {variable: string, file: string}} - val output: string -> output - val arguments: string list -end - -structure Options :> OPTIONS = struct - fun |> (x, f) = f x - infix |> - datatype outformat = Otext | Omatlab | Odual - local - val order = ref 15 - val fasta = ref NONE - val genomesDir = ref "/var/lib/genomes" - val genomes = ref NONE - val genomeText = ref "$genomes_dir/$genome/${order}perword.gz" - val totalWords = ref "$genomes_dir/$genome/${order}total" - val missConstant = ref (1.0 / 26067530.0 / 1000000.0) - val outFormat = ref Otext - val variable = ref "ps_g" - val outFile = ref "$input-$order-$genome.$extension" - fun withDefault (description, default) = concat ( - description - :: " (" - :: (case default of - NONE => ["no default)"] - | SOME x => ["default: ", x, ")"]) - ) - val options = [ - { - short = "r", long = ["order"] - , help = withDefault - ("Set the length of words used in matches", NONE) - , desc = GetOpt.ReqArg ( - fn x => case Int.fromString x of - NONE => Fail.fail - "given order is not a valid integer" - | SOME y => order := y - , "integer" - ) - }, { - short = "a", long = ["fasta-input"] - , help = "FASTA input file" - , desc = GetOpt.ReqArg (fn x => fasta := SOME x, "filename") - }, { - short = "j", long = ["genomes-dir"] - , help = withDefault ( - "Set the directory where genomes are stored" - , SOME (!genomesDir) - ), desc = GetOpt.ReqArg (fn x => genomesDir := x, "directory") - }, { - short = "g", long = ["genome-list"] - , help = withDefault ( - "Read genomes from this file, one per line" - , !genomes - ), desc = GetOpt.ReqArg (fn x => genomes := SOME x, "filename") - }, { - short = "w", long = ["genome-text"] - , help = withDefault ( - "Filename to read per-word counts from" - , SOME (!genomeText) - ), desc = GetOpt.ReqArg (fn x => genomeText := x, "filename") - }, { - short = "t", long = ["total-words"] - , help = withDefault ( - "Filename to read total word counts from" - , SOME (!totalWords) - ), desc = GetOpt.ReqArg (fn x => totalWords := x, "filename") - }, { - short = "k", long = ["miss-constant"] - , help = withDefault ( - "Set the constant used for scoring missing words" - , SOME (Real.toString (!missConstant)) - ), desc = GetOpt.ReqArg ( - fn x => (case Real.fromString x of - SOME y => missConstant := y - | NONE => Fail.fail - "given miss constant is not a valid number" - ), "number" - ) - }, { - short = "x", long = ["text"] - , help = "Select gzipped text output (default)" - , desc = GetOpt.NoArg (fn () => outFormat := Otext) - }, { - short = "m", long = ["matlab"] - , help = "Select Matlab binary output" - , desc = GetOpt.NoArg (fn () => outFormat := Omatlab) - }, { - short = "u", long = ["dual"] - , help = "Select both gzipped text and Matlab binary output" - , desc = GetOpt.NoArg (fn () => outFormat := Odual) - }, { - short = "v", long = ["variable"] - , help = withDefault ( - "Set Matlab output variable" - , SOME (!variable) - ), desc = GetOpt.ReqArg (fn x => variable := x, "name") - }, { - short = "o", long = ["output-file"] - , help = withDefault ( - "Set output filename(s)" - , SOME (!outFile) - ), desc = GetOpt.ReqArg (fn x => outFile := x, "filename") - } - ] - val optionsWithHelp = { - short = "h", long = ["help"] - , help = "Display help" - , desc = GetOpt.NoArg (fn () => ( - print ( - GetOpt.usageInfo { - header = "score <options>" - , options = options - } ^ "\n" - ); OS.Process.exit OS.Process.success - )) - } :: options - in - val (_, arguments) = GetOpt.getOpt { - argOrder = GetOpt.Permute - , options = optionsWithHelp - , errFn = print - } (CommandLine.arguments ()) - val order = !order - val input = case !fasta of - NONE => Fail.fail "No input file given" - | SOME x => x - val genomesDir = !genomesDir - val genomes = !genomes - val genomeText = fn (order, genome) => ( - case - Misc.substitute [ - ("genomes_dir", genomesDir) - , ("genome", genome) - , ("order", Int.toString order) - ] (!genomeText) - of - NONE => Fail.fail "bad per-word count filename syntax" - | SOME x => x - ) - val totalWords = fn (genome, order) => ( - case - Misc.substitute [ - ("genomes_dir", genomesDir) - , ("genome", genome) - , ("order", Int.toString order) - ] (!totalWords) - of - NONE => Fail.fail "bad total word count filename syntax" - | SOME x => x - ) - val missConstant = !missConstant - datatype output = - Matlab of {variable: string, file: string} - | Text of string - | Dual of {text: string, matlab: {variable: string, file: string}} - fun output genome = - let - val common = [ - ("input", Misc.basenameWithoutExtension input) - , ("order", Int.toString order) - , ("genome", genome) - ] - fun textName () = - case Misc.substitute - (("extension", "txt.gz") :: common) (!outFile) - of - NONE => Fail.fail "bad text output filename syntax" - | SOME x => x - fun matlabName () = - case Misc.substitute - (("extension", "mat") :: common) (!outFile) - of - NONE => Fail.fail "bad Matlab output filename syntax" - | SOME x => x - in - case !outFormat of - Otext => Text (textName ()) - | Omatlab => Matlab { - variable = !variable - , file = matlabName () - } | Odual => Dual { - text = textName () - , matlab = { - variable = !variable - , file = matlabName () - } - } - end - end -end diff --git a/src/nbc/program.sml b/src/nbc/program.sml deleted file mode 100644 index df787ad..0000000 --- a/src/nbc/program.sml +++ /dev/null @@ -1,9 +0,0 @@ -signature PROGRAM = sig - val version: string - val name: string -end - -structure Program :> PROGRAM = struct - val version = "2.0" - val name = "Naive Bayes Classifier - Score" -end diff --git a/src/nbc/score.mlb b/src/nbc/score.mlb deleted file mode 100644 index 662bb83..0000000 --- a/src/nbc/score.mlb +++ /dev/null @@ -1,27 +0,0 @@ -$(SML_LIB)/basis/basis.mlb -$(SML_LIB)/basis/mlton.mlb -$(SML_LIB)/smlnj-lib/Util/smlnj-lib.mlb -$(SML_LIB)/mlyacc-lib/mlyacc-lib.mlb -fail.sml -sequence.sml -ann "allowFFI true" in - judy.sml - gzip.sml -end -substitution.grm.sig -substitution.grm.sml -substitution.lex.sml -substitution.sml -misc.sml -options.sml -storejudy.sml -genome.sml -nmer.sml -kahan.sml -score.sml -binary.sml -program.sml -matlab.sml -stopwatch.sml -fasta.sml -main.sml diff --git a/src/nbc/score.sml b/src/nbc/score.sml deleted file mode 100644 index 90d569a..0000000 --- a/src/nbc/score.sml +++ /dev/null @@ -1,30 +0,0 @@ -signature SCORE = sig - val score: int * real * (string -> int option) * real * string -> real -end - -structure Score :> SCORE = struct - fun |> (x, f) = f x - infix |> - - fun addCount (hitsum, fcount, gcount, totalWords) = - Kahan.add (hitsum, Real.fromInt fcount * Math.ln (Real.fromInt gcount / totalWords)) - fun addNmer (totalWords, getGenomeCount) (nmer, ref fcount, (misses, anyhits, hitsum)) = - case getGenomeCount nmer of - NONE => (misses + 1, anyhits, hitsum) - | SOME gcount => ( - misses, true - , addCount (hitsum, fcount, gcount, totalWords) - ) - fun score (order, missConstant, getGenomeCount, totalWords, fragment) = - let - val add = addNmer (totalWords, getGenomeCount) - val seed = (0, false, Kahan.zero) - val (misses, anyhits, hitsum) = - Nmer.count (order, fragment) |> HashTable.foldi add seed - in - if anyhits then - Kahan.add (hitsum, Math.ln missConstant * Real.fromInt misses) - |> Kahan.sum - else Real.negInf - end -end diff --git a/src/nbc/sequence.sml b/src/nbc/sequence.sml deleted file mode 100644 index 73a3fbf..0000000 --- a/src/nbc/sequence.sml +++ /dev/null @@ -1,74 +0,0 @@ -signature SEQUENCE = sig - type 'a t - val fold: ('a * 'b -> 'b) -> 'b -> 'a t -> 'b - val from: (unit -> 'a option) -> 'a t - val map: ('a -> 'b) -> 'a t -> 'b t - val app: ('a -> unit) -> 'a t -> unit - val fromArray: 'a array -> 'a t - val fromList: 'a list -> 'a t - val toList: 'a t -> 'a list - val toArray: 'a t -> 'a array -end - -structure Sequence :> SEQUENCE = struct - type 'a t = unit -> 'a option - fun fold f seed t = - let - fun loop x = - case t () of - NONE => x - | SOME y => loop (f (y, x)) - in - loop seed - end - fun from t = t - fun map f t = - fn () => ( - case t () of - NONE => NONE - | SOME x => SOME (f x) - ) - fun app f t = - let - fun loop () = - case t () of - NONE => () - | SOME x => ( - f x - ; loop () - ) - in - loop () - end - fun fromArray a = - let - val i = ref 0 - fun f () = - if !i >= Array.length a then NONE - else - SOME (Array.sub (a, !i)) - before i := !i + 1 - in - from f - end - fun fromList l = - let - val c = ref l - fun f () = - case !c of - x :: y => (c := y; SOME x) - | nil => NONE - in - from f - end - fun toList t = - let - fun loop l = - case t () of - NONE => rev l - | SOME x => loop (x :: l) - in - loop nil - end - fun toArray t = Array.fromList (toList t) -end diff --git a/src/nbc/stopwatch.sml b/src/nbc/stopwatch.sml deleted file mode 100644 index 43547ed..0000000 --- a/src/nbc/stopwatch.sml +++ /dev/null @@ -1,24 +0,0 @@ -signature STOPWATCH = sig - exception FinishWithoutStart - val start: string -> unit - val finish: unit -> unit -end -structure Stopwatch :> STOPWATCH = struct - exception FinishWithoutStart - local - val time = ref NONE - in - fun start doing = ( - print (concat [doing, "..."]) - ; time := SOME (Time.now ()) - ) - fun finish () = case !time of - SOME t => ( - print (concat [ - " done in " - , Time.toString (Time.- (Time.now (), t)) - , " seconds.\n" - ]); time := NONE - ) | NONE => raise FinishWithoutStart - end -end diff --git a/src/nbc/storejudy.sml b/src/nbc/storejudy.sml deleted file mode 100644 index c6385be..0000000 --- a/src/nbc/storejudy.sml +++ /dev/null @@ -1,17 +0,0 @@ -signature STORE_JUDY = sig - type t - val load: (int * string) Sequence.t -> t - val get: t * string -> int option -end - -structure StoreJudy :> STORE_JUDY = struct - type t = Judy.t - fun load e = - let - val j = Judy.create () - in - Sequence.app (fn (count, nmer) => Judy.insert (j, nmer, count)) e - ; j - end - val get = Judy.get -end diff --git a/src/nbc/substitution.grm b/src/nbc/substitution.grm deleted file mode 100644 index 58505b5..0000000 --- a/src/nbc/substitution.grm +++ /dev/null @@ -1,50 +0,0 @@ -%% -%name SubstitutionGrm -%pos int -%arg (lookup): string -> string -%term - DOLLAR - | TEXT of string - | LEFT_PARENTHESIS - | LEFT_BRACE - | RIGHT_PARENTHESIS - | RIGHT_BRACE - | EOF -%nonterm - STRING of string - | LIST of string list - | PORTION of string - | VARIABLE of string - | PARENTHESIZED of string list - | INSIDE_PARENTHESES of string list - | BRACED of string list - | INSIDE_BRACES of string list -%eop EOF -%noshift EOF -%start STRING -%% -STRING: LIST (concat LIST) -LIST: - (nil) - | PORTION LIST (PORTION :: LIST) -PORTION: - TEXT (TEXT) - | VARIABLE (lookup VARIABLE) -VARIABLE: - DOLLAR TEXT (TEXT) - | PARENTHESIZED (concat PARENTHESIZED) - | BRACED (concat BRACED) -PARENTHESIZED: - LEFT_PARENTHESIS INSIDE_PARENTHESES RIGHT_PARENTHESIS (INSIDE_PARENTHESES) -INSIDE_PARENTHESES: - (nil) - | TEXT INSIDE_PARENTHESES (TEXT :: INSIDE_PARENTHESES) - | LEFT_PARENTHESIS INSIDE_PARENTHESES RIGHT_PARENTHESIS INSIDE_PARENTHESES - ("(" :: INSIDE_PARENTHESES1 @ ")" :: INSIDE_PARENTHESES2) -BRACED: - LEFT_BRACE INSIDE_BRACES RIGHT_BRACE (INSIDE_BRACES) -INSIDE_BRACES: - (nil) - | TEXT INSIDE_BRACES (TEXT :: INSIDE_BRACES) - | LEFT_BRACE INSIDE_BRACES RIGHT_BRACE INSIDE_BRACES - ("{" :: INSIDE_BRACES1 @ "}" :: INSIDE_BRACES2) diff --git a/src/nbc/substitution.lex b/src/nbc/substitution.lex deleted file mode 100644 index e460121..0000000 --- a/src/nbc/substitution.lex +++ /dev/null @@ -1,54 +0,0 @@ -type svalue = Tokens.svalue -type ('a, 'b) token = ('a, 'b) Tokens.token -type pos = int -type lexresult = (svalue, pos) token -type arg = int ref -fun eof _ = Tokens.EOF (~1, ~1) -%% -%full -%header (functor SubstitutionLexFun (structure Tokens: SubstitutionGrm_TOKENS)); -%arg (nesting); -%s VARIABLE PARENTHESIZED BRACED; -text = ([^$] | "\\$")+; -dollar = "$"; -leftparenthesis = "("; -rightparenthesis = ")"; -notparenthesis = [^()]+; -leftbrace = "{"; -rightbrace = "}"; -notbrace = [^{}]+; -other = [A-Za-z0-9_]+; -%% -<INITIAL>{text} => (Tokens.TEXT (yytext, yypos, yypos + size yytext)); -<INITIAL>{dollar} => (YYBEGIN VARIABLE; Tokens.DOLLAR (yypos, yypos + 1)); -<INITIAL>{dollar}{leftparenthesis} => ( - YYBEGIN PARENTHESIZED - ; nesting := !nesting + 1 - ; Tokens.LEFT_PARENTHESIS (yypos, yypos + 1) -); -<INITIAL>{dollar}{leftbrace} => ( - YYBEGIN BRACED - ; nesting := !nesting + 1 - ; Tokens.LEFT_BRACE (yypos, yypos + 1) -); -<VARIABLE>{other} => (YYBEGIN INITIAL; Tokens.TEXT (yytext, yypos, yypos + size yytext)); -<PARENTHESIZED>{notparenthesis} => (Tokens.TEXT (yytext, yypos, yypos + size yytext)); -<PARENTHESIZED>{leftparenthesis} => ( - nesting := !nesting + 1 - ; Tokens.LEFT_PARENTHESIS (yypos, yypos + 1) -); -<PARENTHESIZED>{rightparenthesis} => ( - nesting := !nesting - 1 - ; if !nesting = 0 then YYBEGIN INITIAL else () - ; Tokens.RIGHT_PARENTHESIS (yypos, yypos + 1) -); -<BRACED>{notbrace} => (Tokens.TEXT (yytext, yypos, yypos + size yytext)); -<BRACED>{leftbrace} => ( - nesting := !nesting + 1 - ; Tokens.LEFT_BRACE (yypos, yypos + 1) -); -<BRACED>{rightbrace} => ( - nesting := !nesting - 1 - ; if !nesting = 0 then YYBEGIN INITIAL else () - ; Tokens.RIGHT_BRACE (yypos, yypos + 1) -); diff --git a/src/nbc/substitution.sml b/src/nbc/substitution.sml deleted file mode 100644 index 57d55cf..0000000 --- a/src/nbc/substitution.sml +++ /dev/null @@ -1,26 +0,0 @@ -signature SUBSTITUTION = sig - val substitute: (string -> string) -> string -> string option -end - -structure Substitution :> SUBSTITUTION = struct - structure LrVals = SubstitutionGrmLrValsFun (structure Token = LrParser.Token) - structure Lex = SubstitutionLexFun (structure Tokens = LrVals.Tokens) - structure Parser = JoinWithArg ( - structure ParserData = LrVals.ParserData - structure Lex = Lex - structure LrParser = LrParser - ) - fun substitute lookup source = - let - val position = ref 0 - val instream = TextIO.openString source - fun read n = TextIO.inputN (instream, n) - val lexer = Parser.makeLexer read (ref 0) - fun error (_, _, _) = () - val (result, _) = Parser.parse (0, lexer, error, lookup) - val () = TextIO.closeIn instream - in - SOME result - end - handle Parser.ParseError => NONE -end diff --git a/src/nbc/tabulate.ml b/src/nbc/tabulate.ml deleted file mode 100644 index dbfbe43..0000000 --- a/src/nbc/tabulate.ml +++ /dev/null @@ -1,190 +0,0 @@ -let (|>) a b = b a -let car, cdr = fst, snd -let odd n = n land 1 = 1 -let even n = not (odd n) -module ExtArray = ExtArray.Array -module ExtList = ExtList.List -module ExtString = ExtString.String -module Program = struct - let name = "Naive Bayes Classifier - Tabulate" - let version = "1.0" -end -let chop_extra s = ExtString.strip ~chars:"_- " s -let any_alphanumeric s = - let e = String.length s in - let rec loop i = - if i = e then false - else match s.[i] with - 'a'..'z' | 'A'..'Z' | '0'..'9' -> true - | _ -> loop (i + 1) - in loop 0 -let guess_prefix filenames = - let s = - List.map Misc.basename_without_extension filenames - |> Misc.longest_common_substring |> chop_extra - in - if any_alphanumeric s then s - else "" -module Options = struct - let parser = OptParse.OptParser.make ~version:Program.version () - let genomes = - let option = OptParse.StdOpt.str_option ~metavar:"file" () in - OptParse.OptParser.add parser ~short_name:'g' ~long_name:"genome-list" option; - (fun () -> match OptParse.Opt.opt option with - Some x -> x - | None -> - OptParse.OptParser.usage parser (); - exit 1 - ) - let columns = - let option = OptParse.StdOpt.int_option ~default:200 () in - OptParse.OptParser.add parser ~short_name:'c' ~long_name:"columns" option; - (fun () -> OptParse.Opt.get option) - let template = - let option = OptParse.StdOpt.str_option ~metavar:"template" - ~default:"$prefix-$n.csv.gz" () in - OptParse.OptParser.add parser ~short_name:'t' ~long_name:"output-template" option; - (fun () -> OptParse.Opt.get option) - let given_prefix = - let x = ref None in - OptParse.StdOpt.str_callback ~metavar:"prefix" (fun s -> x := Some s) - |> OptParse.OptParser.add parser ~short_name:'p' - ~long_name:"output-prefix"; - (fun () -> !x) - let files = OptParse.OptParser.parse_argv parser - let prefix = match given_prefix () with - None -> ( - match guess_prefix files with - "" -> "table" - | s -> s - ) | Some x -> x - let string_of_int n = Printf.sprintf "%08u" n - let output n = template () |> Misc.substitute ["prefix", prefix; "n", string_of_int n] -end - -exception Collide of string * string * string -let match_files_to_genomes genomes files = - let g = Array.of_list genomes in - let gl = Array.length g in - let gi = Array.init gl (fun i -> i) in - Array.sort (fun ai bi -> compare (String.length g.(bi)) (String.length g.(ai))) gi; - let gf = Array.make gl None in - List.iter (fun file -> - try ( - let i = ExtArray.find (fun i -> ExtString.exists file g.(i)) gi in - match gf.(i) with - None -> gf.(i) <- Some file - | Some other_file -> raise (Collide (g.(i), other_file, file)) - ) with Not_found -> () (* file does not match any genomes *) - ) files; - let r = ref [] in - for i = gl - 1 downto 0 do - match gf.(i) with - None -> () (* no file matches a given genome *) - | Some file -> r := (g.(i), file) :: !r - done; - !r - -let columns = Options.columns () -let genomes, files = - let genomes = Misc.io_of_file (Options.genomes ()) |> Misc.enum_of_lines - |> ExtList.of_enum in - let g, f = match_files_to_genomes genomes Options.files |> List.split in - Array.of_list g, - f |> List.map (fun x -> x, x |> open_in |> IO.input_channel) |> Array.of_list - -let newfile, newline, cell, finish = - let i = ref 0 in - let open_file () = - i := !i + 1; - let filename = Options.output !i in - Gzip.open_out filename - in - let c = ref None in - let force_out () = - match !c with - None -> - let d = open_file () in - c := Some d; - d - | Some d -> d - in - let start_of_line = ref true in - (* newfile *) (fun () -> - (match !c with - Some c -> Gzip.close_out c - | None -> ()); - i := !i + 1; - c := None - ), - (* newline *) (fun () -> - let c = force_out () in - Gzip.output_char c '\n'; - start_of_line := true - ), - (* cell *) (fun s -> - let c = force_out () in - if not !start_of_line then Gzip.output_char c ','; - Gzip.output c s 0 (String.length s); - start_of_line := false - ), - (* finish *) (fun () -> - match !c with - Some c -> Gzip.close_out c - | None -> () - ) - -let read_two_fields (filename, c) = - let line = IO.read_line c in - match Misc.split2 line with - Some x -> x - | None -> - Printf.eprintf "\ - There is something wrong with the file %s. \ - The offending line is:\n%s\n" filename line; - exit 1 - -let output_file () = - newfile (); - let yes_we_have_input = - let x = ref false in - (fun () -> - if !x then () - else ( - cell "names"; - x := true - ) - ) - in - let a = Array.make columns "" in - let rec loop i = - if i < columns then ( - try ( - let name, datum = read_two_fields files.(0) in - yes_we_have_input (); - cell name; - a.(i) <- datum; - loop (i + 1) - ) with IO.No_more_input -> - if i > 0 then Array.sub a 0 i - else raise End_of_file - ) else a - in - let a = loop 0 in - let these_columns = Array.length a in - newline (); - cell genomes.(0); - for i = 0 to these_columns - 1 do cell a.(i) done; - newline (); - for i = 1 to Array.length genomes - 1 do - cell genomes.(i); - for j = 0 to these_columns - 1 do - let _, datum = read_two_fields files.(i) in - cell datum - done; - newline () - done - -let () = - try while true do output_file () done - with End_of_file -> finish () diff --git a/src/nbc/test-generate.sml b/src/nbc/test-generate.sml deleted file mode 100644 index 77cafd3..0000000 --- a/src/nbc/test-generate.sml +++ /dev/null @@ -1 +0,0 @@ -structure Test = Test () |