i / auragem.ddns.net 70 i / auragem.ddns.net 70 i AuraSearch Features / auragem.ddns.net 70 i / auragem.ddns.net 70 i / auragem.ddns.net 70 iCurrent State of Features / auragem.ddns.net 70 i------------------------- / auragem.ddns.net 70 i / auragem.ddns.net 70 i* Full Text Search of document metadata, with porter stemming. / auragem.ddns.net 70 i* + and - operators, for a required term, or excluded term, / auragem.ddns.net 70 irespectively. / auragem.ddns.net 70 i* Title extraction using first apparent heading, regardless of its / auragem.ddns.net 70 ilevel. / auragem.ddns.net 70 i* Gemsub feed detection. / auragem.ddns.net 70 i* Line counts. / auragem.ddns.net 70 i* Indexed publication dates based on dates in filenames. / auragem.ddns.net 70 i* File size information. / auragem.ddns.net 70 i* Indexed Mp3, Ogg, and Flac file metadata (ID3, MP4, and Ogg/Flac). / auragem.ddns.net 70 i* Aggregator based on search engine index. / auragem.ddns.net 70 i* Wildcards: * and ? / auragem.ddns.net 70 i* Crawler: Robots.txt is followed, including "Allow", "Disallow", and / auragem.ddns.net 70 i"Crawl-Delay" directives. The Slow Down gemini status code is also / auragem.ddns.net 70 ifollowed. / auragem.ddns.net 70 i* Crawler: 2 second delay between crawling of pages on the same / auragem.ddns.net 70 idomain. / auragem.ddns.net 70 i* Parses gemtext, spartan text, nex listings, scrolltext. / auragem.ddns.net 70 i* Partial markdown parsing. / auragem.ddns.net 70 i / auragem.ddns.net 70 iOutdated Features: / auragem.ddns.net 70 i* AND, OR, NOT, parentheses grouping, and quotes / auragem.ddns.net 70 i* Filters: "TITLE", "URL", "ALBUM", "ARTIST", "ALBUMARTIST", / auragem.ddns.net 70 i"COPYRIGHT", "CONTENTTYPE", "LANGUAGE", and "PUBLISHDATE". The syntax / auragem.ddns.net 70 iis "field: term". Field names must be in all capital letters. / auragem.ddns.net 70 i* Fuzzy Searching by placing ~ after a search term / auragem.ddns.net 70 i* Proximity Searching: if you want to search for two words that are / auragem.ddns.net 70 iwithin a distance of 10 words of each other, then query with "term_one / auragem.ddns.net 70 iterm_two"~10 / auragem.ddns.net 70 i* Range Searching: For searching in ranges of numbers or dates. Can / auragem.ddns.net 70 ibe used with filters, like the PUBLISHDATE filter. An example of / auragem.ddns.net 70 ifiltering based on a publication date range would be, / auragem.ddns.net 70 iPUBLISHDATE:[20220101 to 20231201] / auragem.ddns.net 70 i / auragem.ddns.net 70 i / auragem.ddns.net 70 iFeatures Coming Soon / auragem.ddns.net 70 i-------------------- / auragem.ddns.net 70 i / auragem.ddns.net 70 i* PDF and Djvu file metadata indexed / auragem.ddns.net 70 i* Image file metadata indexed / auragem.ddns.net 70 i* Plain text file full contents indexed / auragem.ddns.net 70 i* Backlinks and searching of link text / auragem.ddns.net 70 i* Page Metadata Lookup / auragem.ddns.net 70 i* Full Markdown, Tinylog, and Twtxt parsing to get links, titles, and / auragem.ddns.net 70 iheading information. / auragem.ddns.net 70 i / auragem.ddns.net 70 i / auragem.ddns.net 70 iHistory / auragem.ddns.net 70 i------- / auragem.ddns.net 70 i / auragem.ddns.net 70 iAuraGem was a search engine that I started about 2 years ago under / auragem.ddns.net 70 iits original name, Ponix Search. It was originally designed to / auragem.ddns.net 70 iexperiment with how I could make search results better. The official / auragem.ddns.net 70 iannouncement of the Search Engine happened on 2021-07-01: / auragem.ddns.net 70 h2021-07-01 Search Engine & Ponix Capsule Now Open Source (MIT) URL:gemini://auragem.ddns.net/devlog/20210701.gmi auragem.ddns.net 70 02021-12-05 AuraGem Search Begins Crawling Again /g/search/devlog/20211205.gmi auragem.ddns.net 70 i / auragem.ddns.net 70 iNote that some of the information in the above posts have been / auragem.ddns.net 70 irecently updated to match the current URL and Ip Address of the / auragem.ddns.net 70 icrawler and gemini capsule. / auragem.ddns.net 70 i / auragem.ddns.net 70 iOne of the first priorities with AuraSearch was to have extraction of / auragem.ddns.net 70 ifile metadata for as many files as possible. Audio files were one of / auragem.ddns.net 70 ithe first to get this feature. PDFs and Djvu files were supposed to be / auragem.ddns.net 70 inext, and support was added for them on 2022-07-19, but the feature / auragem.ddns.net 70 iwas buggy and never worked, unfortunately. As you can see in the below / auragem.ddns.net 70 ipost, I chose to go with Keyword Extraction (which was later removed / auragem.ddns.net 70 iand replaced with simple mentions and tags extraction) instead of Full / auragem.ddns.net 70 iText Searching on page contents. Part of this was to save space, and / auragem.ddns.net 70 ipart of it was to respect copyright. However, I am rethinking this / auragem.ddns.net 70 iapproach now that the Stats page can determine how large the text-only / auragem.ddns.net 70 iportion of geminispace is (no more than 5GB total). / auragem.ddns.net 70 h2022-07-19 AuraGem Search Engine Update URL:gemini://auragem.ddns.net/devlog/20220719.gmi auragem.ddns.net 70 1Stats Page /g/search/stats/ auragem.ddns.net 70 i / auragem.ddns.net 70 iIn the above article, you can see that I start to play with the / auragem.ddns.net 70 inotion of different types of searches. I think this idea remains / auragem.ddns.net 70 iimportant today: / auragem.ddns.net 70 i> Another problem that the above process would not catch are names and / auragem.ddns.net 70 i> proper nouns. These are often very important words that people would / auragem.ddns.net 70 i> want to search for (e.g. Mathematics, C++, Celine Dion, FTS). I do not / auragem.ddns.net 70 i> have an easy method for this atm. / auragem.ddns.net 70 i / auragem.ddns.net 70 iThe next update on 2022-07-21 added Full Text Searching of link and / auragem.ddns.net 70 ifile metadata, which drastically improved the speed of searches. Yes, / auragem.ddns.net 70 ithis came with stemming because my database's FTS uses Lucene++. / auragem.ddns.net 70 h2022-07-21 AuraGem Search Update URL:gemini://auragem.ddns.net/devlog/20220720_search.gmi auragem.ddns.net 70 i / auragem.ddns.net 70 iNot long after I wrote an article about FTS, ranking systems, and / auragem.ddns.net 70 isome of the problems that Search Engines have to handle: / auragem.ddns.net 70 h2022-07-22 Search Engine Ranking Systems Are Being Left Unquestioned URL:gemini://auragem.ddns.net/devlog/20220722.gmi auragem.ddns.net 70 i / auragem.ddns.net 70 iThe most important portion of this article, however, is recognizing / auragem.ddns.net 70 ihow people do searches: / auragem.ddns.net 70 i> This also introduces the argument that the ranking systems are really / auragem.ddns.net 70 i> only important for underspecified queries (broad queries), so the / auragem.ddns.net 70 i> emphasis on the problems with ranking algorithms is unwarranted. This / auragem.ddns.net 70 i> argument hardly makes sense when the majority of searches that people / auragem.ddns.net 70 i> make are broad. I would also argue that broad searches are most used / auragem.ddns.net 70 i> for *discovering* pages, not for getting to a specific page. However, / auragem.ddns.net 70 i> ranking based on popularity prioritizes what it thinks people would / auragem.ddns.net 70 i> want, which is more suited for specific searches using broad queries, / auragem.ddns.net 70 i> at the expense of discovery of broad topics. Broad discovery using / auragem.ddns.net 70 i> broad topic queries and specific searches using proper-noun queries or / auragem.ddns.net 70 i> very specific queries are both much better ways of dealing with / auragem.ddns.net 70 i> searches without relying on popularity. / auragem.ddns.net 70 i / auragem.ddns.net 70 iWhen making a search engine, one must balance the search results / auragem.ddns.net 70 ibetween discovery (broadness) and exact matches (exactness). Relevancy / auragem.ddns.net 70 iapplies to both of these, but is more important for discovery. I / auragem.ddns.net 70 icontinue to think that link analysis assumes that people want exact / auragem.ddns.net 70 imatches of pages while using broad queries. For example, if someone / auragem.ddns.net 70 itypes in "search engine", a PageRank system would put the most popular / auragem.ddns.net 70 isearch engine at the top along with popular articles about search / auragem.ddns.net 70 iengines, assuming that the person wanted that specific search engine, / auragem.ddns.net 70 iwhen it's more likely they wanted a collection of search engines. / auragem.ddns.net 70 iRather, my approach is to return broad relevant discovery-based / auragem.ddns.net 70 iresults with broad queries, and exact pages with exact queries. / auragem.ddns.net 70 i / auragem.ddns.net 70 iExact queries include words from titles, domain names, capsule names, / auragem.ddns.net 70 iservice names, basically mainly proper nouns or a specific combination / auragem.ddns.net 70 iof words that matches the page information. Broad queries, however, / auragem.ddns.net 70 iuse category names and common nouns. / auragem.ddns.net 70 i / auragem.ddns.net 70 iWhen I type "Station", I want an exact match for Station itself. / auragem.ddns.net 70 iHowever, when I type "social network", I want search results that give / auragem.ddns.net 70 ia very broad set of capsules that are social networks. I believe that / auragem.ddns.net 70 ithis is how most people would use search engines, especially if they / auragem.ddns.net 70 ido not rely much on filtering, and this is the exact methodology that / auragem.ddns.net 70 iI use for my article analyzing gemini's search engines: / auragem.ddns.net 70 h2022-08-07 Gemini Search Results Study, Part 1 URL:gemini://auragem.ddns.net/devlog/20220807.gmi auragem.ddns.net 70