slides: a sliding window range iterator#4027
Conversation
|
Another way to say it is that it iterates over all sub-strings with a certain length. |
|
@yebblies you might be right that the documentation and name could be more intuitive. I looked at different standard libs and how they name this function. tl;dr: Scala: sliding
Ruby: each_cons
Clojure: partition
|
|
The use of "pair" here, both in the function name and in the documentation, is very confusing, because "pair" means two of something, but you're actually talking about k-mers, for any k≥2. (On that note, does the code handle the k=1 case nicely?) IMO a better name would be something along the lines of a sliding window of size k, or something along those lines. |
I renamed it to
Yes - check the tests ;-) |
std/range/package.d
Outdated
| import std.array: array; | ||
| import std.algorithm: equal, map; | ||
| import std.conv: to; | ||
| assert(iota(3).slides(3).array.to!string == "[[0, 1, 2]]"); |
There was a problem hiding this comment.
there is no reason to convert this to string just compare with [[0,1,2]] or use the algorithm function equal
There was a problem hiding this comment.
I didn't know that its as easy as using equal!equal([[0,1,2]])) - very nice :)
There was a problem hiding this comment.
D is like vim. You learn something new every day.
|
FYI: This is also part of ndslice. |
|
This also tackles parts of issue 6621 |
std/range/package.d
Outdated
| } | ||
| } | ||
|
|
||
| private: |
There was a problem hiding this comment.
Two nitpicks:
- Using Python's notation for private symbols with an underscore is useless with an explicit
privatefeature - Struct members should be at the top
There was a problem hiding this comment.
Using Python's notation for private symbols with an underscore is useless with an explicit private feature
I know, but this style keeps popping up in phobos and the method that slides is based on (chunks) used the same notation.
Struct members should be at the top
Again kept the style from chunks, but I do agree that it should be at top - updated :)
|
Anything else that is blocking this PR? - I do believe that this is a quite common method and useful for phobos. |
std/range/package.d
Outdated
| of size $(D windowSize) of a $(D source) range. | ||
| $(D Source) must be a forward range. $(D windowSize) must be | ||
| greater than zero. For $(D windowSize = 2) it is similar to | ||
| $D(zip(arr, arr.save.dropOne)). |
There was a problem hiding this comment.
zip(source, source.save.dropOne)?
edit:
s/$D(/$(D/ (see broken output). Note that we've introduced a new syntax code to replace $(D code) and there shouldn't be any reason to use the D macro directly in new additions.
|
A bit late to the party, but I've needed this more than once so it would be great to have in phobos. Just for fun, a compile-time windows size gives a succint definition and allows multiple foreach arguments: http://dpaste.dzfl.pl/fced4f16a987 I haven't figured out a clever way to use that same technique for a runtime |
ff9e238 to
c2b1e68
Compare
I fully agree with this, however when I looked at |
|
Those are quite old. I'm not sure if they were designed that way because of buggy/incomplete compilers or simply because practice regarding ranges hadn't yet fully evolved. Maybe a mix of both (and maybe there will be someone who comes out of the woodwork in support of the old practice…). Regardless, changing them would require deprecating the public range types and reworking their documentation, and YMMV whether the latter is enough to justify the former. At any rate I don't know of any plan for them. |
ping @andralex - what is your opinion on this addition? |
|
Please add additional variant, which can work with simple input ranges. This function should be included along with circular buffer range, thought. In addition, two kinds of sliding windows should be supported: one with fixed width and one with max length (less length for borders). |
|
I think we should not be adding more inner types, because they add significant template bloat. I experienced this to a huge degree when writing my iopipe library. See this forum post: https://forum.dlang.org/post/n96k3g$ka5$1@digitalmars.com Note: if it is decided to keep the struct inside the function, there is no need to add the template parameter to the inner struct. It's already templated based on the function. |
|
Okay I managed to find a bit of time & went through
Isn't this the typical range style to have
This was inspired by the way
Alrighty - I removed the buffered sliding window, but left the bidirectional support as it's very short code (once figured out correctly). |
JackStouffer
left a comment
There was a problem hiding this comment.
Looks good. One more pass through
| enum needsEndTracker = true; | ||
| } | ||
|
|
||
| private bool _empty; |
|
|
||
| static if (hasSlicing!Source) | ||
| { | ||
| private enum hasSliceToEnd = hasSlicing!Source && is(typeof(Source.init[0 .. $]) == Source); |
| { | ||
| // if possible use the faster opDollar overload | ||
| static if (hasSliceToEnd) | ||
| _source = _source[$ .. $]; |
There was a problem hiding this comment.
Often opDollar is just an alias to the length function. Why do you state that the $ branch is faster?
| /// Forward range primitives. Always present. | ||
| @property auto front() | ||
| { | ||
| assert(!empty, "Attempting to access front on an empty range"); |
There was a problem hiding this comment.
nitpick:
Attempting to access front on an empty slides
Same for other error messages
| { | ||
| /** | ||
| Indexing and slicing operations. Provided only if | ||
| $(D hasSlicing!Source) is $(D true). |
|
|
||
| static if (!isInfinite!Source) | ||
| { | ||
| typeof(this) opSlice(size_t lower, size_t upper) |
|
Ping @wilzbach this needs just a bit more work to be ready, let's get this merged |
|
As @wilzbach seems to be busy I'll pull this and submit a follow up PR to address the nitpicks |
slides: a sliding window range iterator merged-on-behalf-of: unknown
Clean-up for PR #4027 merged-on-behalf-of: Andrei Alexandrescu <andralex@users.noreply.github.com>
* Improve many docs in std.array * Fix missing '|' in std.format grammar. * Add CT-checked format string overloads to std.format * Add overloads for formattedWrite, formattedRead, format, sformat. * Throw FormatError when formatValue is called with floating points. The latter allows `snprintf` to be avoided in CTFE when checking format strings so floating point arguments can be checked. * Tweaks Add constraints. Fix test for ctfpMessage equality. * Add changelog entry * Improve some docs in std.traits * Added some type qualifiers to std.parallelism * Use dfmt on std/concurrency.d * Issue 13568: Add CT-checked format string overloads to std.stdio * Add overloads for readf, writef, writefln. * Separate fmt argument from args for writef[ln] to improve docs. * std.format: Improve FormatException messages * Simplified overloads of std.range.cycle. * std.container.Array - documentation cleanup - add missing details, e.g. about exception being thrown - fix wrong descriptions - fix wrong complexity specifications * changed front declaration to an alias * Document compile-time checking for format strings * split std.algorithm tests to avoid OOM - as seen on win-farm-1 tester * Improve many docs in std.array * Clarify deallocation done by typecons.RefCounted. I had trouble understanding how RefCounted!T was supposed to help me manage resources. These document changes should clarify the process and make it easier for others to understand. This is follow-up after this post on the D learn forum: https://forum.dlang.org/post/qctmkqpqnreysxcjmrgm@forum.dlang.org and after commits cd86cc2 and 33217eb "Clarify deallocation done by std.typecons.Unique." * Fix issue 17270 * std.container.Array - optimize concatenation * fix * assignment and improved unit test * Nicer error messages for std.typecons.experimental.Final. * added static assert for * type * added static asserts for arrays and pointers * Fix issue 17251 - Appender.put doesn't accept const input range elements. The two overloads taking an element and a const range were conflicting because canPutConstRange is overlapping the definition of canPutItem. * Remove trailing whitespace. * Remove @disable, which is now redundant. * Add disclaimer to std.random about unsuitability for cryptographical purposes. * It now still accepts inputRanges when they're infinite. * Add link to mir-random * update information about ndslice * Fixed typos in RefCounted documentation. * More edits to typecons.RefCounted * Mention the extra checks in the std.string.assumeUTF docs * Documented undocumented hmac helpers * Fix Issue 17288 - formattedWrite with width/precision and no value Previously an RangeError at runtime or assert(0) at CT would fire from formatNth with message e.g. "n = 1". * Fix Issue 17286 - A function for comparing two digests securely * Use collectExceptionMsg, fix coverage for missing width/precision * Add tests for incompatible format spec Note: Remaining formatValue overloads may defer to these value types, e.g. pointer -> int, bool -> int, char -> int. * std.container.Array - do not realloc if enough capacity * fix issue 15763 - Document behavior of relative difference. Also fix issue where symmetry was assumed when lhs was a value and rhs was not. * Merge pull request dlang#5296 from ntrel/ct-stdio-format Issue 13568: Add CT-checked format string overloads to std.stdio merged-on-behalf-of: H. S. Teoh <quickfur@users.noreply.github.com> * clear changelog * Added changelog entry for secureCompare * Fix Issue 17283 - std.experimental.typecons uses private module members * Fix issue 15534 - Rework constructor docs, removing reference to string parameter that doesn't exist, and using more appropriate ddoc style. * Add unit test * Make allocators' resolveInternalPointer take a const pointer as source * Allocators should take a ubyte[] memory block instead of a void[] one Allocators that rely on taking a block of unused memory to manage and allocate from it should take a ubyte[] instead of a void[] in order to prevent users from accidentally passing valid data and cause memory corruption/leaks. * Fix style problems in std.experimental.allocator.gc_allocator * Remove now-redundant import of onRangeError. * Changed unit test * update pipeline ci repo url - finally moved repo under dlang * fix issue 17314 * fix issue 17314 * Only grow if full * Fix style * fix link The link is broken because the module name. Hope this will fix it. * Update random.d * Fix issue 17327 - std.getopt: Repeated boolean command option fails. * Fix issue 17327. Review comments: drop continue statement. * netbsd small changes: fix datetime.d * format with comma formatspec to add separator into numbers * Link to ConvException docs in std.bigint ctor doc * apply changes for netbsd: stdio - set name to not null value for reopen * issue 15645 - Prevent unsafe usage of Tuple.slice * std.getopt: Add unit tests for delegates as callbacks. * std.getopt unit tests: No trailing whitespace. * std.getopt unit tests: No trailing whitespace. * Add missing sig constraints to std.numeric.gcd. First of all, it makes no sense to take arbitrary types (e.g., gcd("a","abc") should not even match the template). Second of all, the current implementation only works with built-in types, so the function should restrict itself to only accepting built-in types so that it is overloadable for other types (e.g., BigInt). * Implement generic gcd for non-builtin types. * Documentation. * std.getopt: Make private functions safe. * Annotate unittests per Phobos coding style. * std.getopt: make setConfig safe, pure, nothrow, nogc * SharedFreeList.reallocate must be shared and take a ref void[] * Add ISharedAllocator and have the _processAllocator use it * allow precise scanning of DATA/TLS segment: buffers for preallocated class instances should be typed void[], not ubyte[] and must be properly aligned * workaround ld rejecting explicit alignment * Move std/datetime.d into std/datetime/package.d. * Create the new (empty) files in std/datetime for the split. * Move several items from std/datetime/package.d to std/datetime/common.d. * Move Interval to std.datetime.interval. * Move PosInfInterval to std.datetime.interval. * Move NegInfInterval to std.datetime.interval. * Move interval functions to std.datetime.interval. * Move IntervalRange to std.datetime.interval. * Move PosInfIntervalRange to std.datetime.interval. * Move NegInfInterval to std.datetime.interval. * Move TimeZone to std.datetime.timezone. * Move LocalTime to std.datetime.timezone. * Move UTC to std.datetime.timezone. * Move SimpleTimeZone to std.datetime.timezone. * Move PosixTimeZone to std.datetime.timezone. * Move WindowsTimeZone to std.datetime.timezone. * Move timezone-related free functions to std.datetime.timezone. * Move TimeOfDay to std.datetime.timeofday. * Move Date to std.datetime.date. * Move DateTime to std.datetime.datetime. * [BOOKTABLES]: Add BOOKTABLE to std.encoding * std.range: Add missing methods to the booktable, disable the quickindex and improve the docs * [BOOKTABLES]: Add BOOKTABLE to std.uni * issue# 17372: icmp chokes on a lambda that can throw. This removes some inappropriate attributes from a fullCasedCmp. It's templated, and their applicability depends on the template argument, so they should be inferred. And the unit tests for icmp which test the attributes (including @safe) pass without the attributes on fullCasedCmp, so I have no idea why any of them (particularly @trusted) were ever there. * Split style target for CircleCi * Move SysTime to std.datetime.systime. * Move Clock to std.datetime.systime. * Move stray SysTime-related functions to std.datetime.systime. * Remove some unused private functions. * Fix links in std.datetime. * Fixes to make circleci happy. * Specify the exact folder location for the style Makefile target of the D source files * Add changelog entry for splitting std.datetime. * Move TimeOfDay to std.datetime.date. * Move DateTime to std.datetime.date. * Move stray functionality in std.datetime.common to std.datetime.date. * Remove uses of common, datetime, and timeofday. * Remove std.datetime.common,datetime,timeofday. * Fix links in std.datetime. * Update changelog entry for splitting std.datetime. * Fix Ddoc links in std.datetime * Add the MonoTime equivalents of std.datetime.StopWatch/benchmark. std.datetime.package has StopWatch, benchmark, comparingBenchmark, and measureTime, all of which use TickDuration (which would be deprecated, but it can't be deprecated as long as those functions in std.datetime are deprecated). This commit introduces std.datetime.stopwatch to replace those functions in std.datetime. In order to avoid symbol conflicts, std.datetime.stopwatch will not be publicly import in std.datetime.package until the old symbols have been removed. std.datetime.experimental.stopwatch contains StopWatch and benchmark which have essentially the same APIs as the ones in std.datetime.package, but they use MonoTime and Duration. comparingBenchmark has not been ported to MonoTime and Duration, because it is simply a wrapper around benchmark. measureTime has not been ported to MonoTime and Duration, because it is equivalent to using StopWatch with a scope(exit) statement. The old functionality will be deprecated the major release after the new symbols have been introduced. * Update the changelog entry for splitting std.datetime. Now, it includes information on std.datetime.stopwatch. * Update changelog entry to use MREF instead of LINK2. * Warn about impending deprecation of functions in std.datetime.package. * Logger sharedLog comment on thread-safety fix Issue 16232 - std.experimental.logger.core.sharedLog isn't thread-safe * Move deprecations along. * remove .deps file generation - for development, people can use the much faster std/algorithm.test targets - for batch unittest recompiling isn't of much interest as all the testers just start from a clean slate anyhow - furthermore the .deps generation is broken anyhow, see Issue 7016 * posix.mak: Discard DDoc output of DDoc warning test Send documentation output to /dev/null when checking for warnings. * Fix std.datetime autotester failure for FreeBSD 10.3/11. The tests fail depending on your timezone due to a known FreeBSD bug, so we need to disable them until the bug in FreeBSD gets fixed. * Removed level of indirection from internal std.xml.ElementParser code * Allow parallel execution of public unittests * Check for an existent tools repository on more files * Make behavior of environment.opIndexAssign on Posix to match the behavior on Windows. Setting null as environment value leads to removing this variable according to SetEnvironmentVariable documentation while Posix leaves unspecified what setenev should do if value is null. * Add @safe to two common functions in std.xml * Debugger target for posix It is desirable to be able to attach a debugger to the execution of tests of phobos. This patch add this feature. By calling make -f posix.mak DEBUGGER=gdb std/XXXXX.debug phobos and the module XXXXX will be compiled with debug symbols and after its compilation the debugger gdb starts the compiled executable. DEBUGGER=gdb can be omitted as it is the default debugger. * Fixed some typos in std.string docs * [Static if] replace overload constraints with static if (mutation.d) * Deprecate obsolete pattern matching functions in std.string * Fix some bad/slangy English in std.getopt. I don't think that it makes sense to use slang like "dunno" in a standard library. * Fix issue 17394 - mkdirRecurse isn't @safe * Issue 16053: Fix it so that SysTime's from*String supports more than 7 digits. ISO 8601 says that it's up to the application to decide how many digits to put in the fractional seconds if they're present. SysTime.to*String puts up to 7 (stripping trailing zeroes), because that's hecto-nanosecond precision, and SysTime holds the time in hecto-nanoseconds. Currently, from*String only accepts up to 7 digits in the fractional seconds, which _does_ follow the spec in that (per the spec) the number of digits is up to the applications. However, while we never emit more than 7 digits, other applications do, so only accepting 7 digits makes us incompatible with them, whereas accepting them would make us more compatible with other programs, and it would actually be more efficient, since we'd have fewer checks in the code. So, these changes make is so that SysTime.from*String accepts more than 7 digits in the fractional seconds, but the additional digits are truncated (since SysTime doesn't support more than 7 digits of precision). * Add changelog entry for deprecation of pattern functions * Fix issue 16246 - cannot call iota with 3 [u]bytes or 3 [u]shorts * Fix Issue 16326 - filter is not lazy enough & has weird save behavior * Improve behavior on overflow * std.datetime.stopwatch: Fix random test failure on Win32 * Fix Issue 15720 - iota(long.max, long.min, step) does not work properly * Small clean-up for PR dlang#4027 * Rename slides to slide * Rename slideWithLessElements to withFewerElements * Fixed spelling mistakes in std.range * std.datetime.stopwatch: Fix another random test failure on Win32 * Defer addition of template until we need it for ranges. * Add asOriginalType function for enums
|
@ all: I just revived |
As you can see - I was a bit productive during this weekend ;-)
I propose here to add a
pairwiserange iterator that iterates over all k-mers of a range, e.g.Why is this useful?
For me it's kind of common to work with k-mers and texts and thus it's nice to have a handy iterator for this. See this simple example for how easy counting will get:
and of course works like this for any k:
Current alternatives
It's possible to do this with
zip, but it looks a bit ugly and is only practical forn==2.Trivia
pairsis a fine name to, but that might conflict with other functions e.g.byPairschunks, but I couldn't figure out a proper way to copy/inherit this functionality and most functions are actually changededit: changed name to
slides