Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions src/fsharp/CheckDeclarations.fs
Original file line number Diff line number Diff line change
Expand Up @@ -381,15 +381,13 @@ let CheckNamespaceModuleOrTypeName (g: TcGlobals) (id: Ident) =
errorR(Error(FSComp.SR.tcInvalidNamespaceModuleTypeUnionName(), id.idRange))

let CheckDuplicates (idf: _ -> Ident) k elems =
elems |> List.iteri (fun i uc1 ->
elems |> List.iteri (fun j uc2 ->
let id1 = (idf uc1)
let id2 = (idf uc2)
let ids = elems |> List.mapi (fun i uc -> i, idf uc)
for (i, id1) in ids do
Copy link
Contributor

@En3Tho En3Tho Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't get what's the difference between iterating list via List.iteri and iterating it via double foreach. Maybe I'm missing something? What's the gain? Also, you create new tuple list and then use tuples so this code allocates more objects now while previous was only allocating fsharpfunc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't see we would end up with tuples when I suggested this. @MecuSorin maybe my suggestion was bad

Copy link
Contributor

@En3Tho En3Tho Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forki I think the idea of optimization was to split this function into two, make them recursive, one will keep track of current start (tail) and then call other, which will try to find duplicates using this tail. This way you won't be needing j > i check every time and can skip not needed iteration.
But still it needs some kind of benchmark to be sure and it's not O(n) anyway. Only way I see here making this O(n) is to make a Hashset/Map but it needs additional memory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@En3Tho The optimization is about how many times the function idf is invoked.

Before for each element in the sequence, idf was invoked 2 * n times that means actually 2*n^2 . By allocating a new list with the mapping of each element to idf we will invoke that function just n times. The actual checking of the duplicates has the same complexity like before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbf in all call sites is only calling the ID property. Not computing something. So after thinking about it: there is probably not much to gain here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a lack of evidence on both sides here:

  1. Proof that this speeds things up, given that we're still effectively doing a nested for loop
  2. Proof that the replacement with tuples actually allocates, and thus proof that this makes anything worse

Given the lack of evidence on either side I don't think we can proceed with either accepting or closing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, without a proper measurement is hard to take a decision. Maybe someone with a better deeper innerworkings experience will shed a light on the thing. I don't have the time now to write a proper benchmark test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MecuSorin , I shall close this PR, if you want to reopen it when you have time to provide some benchmarks that would be good.

Thanks for looking at this, and I look forward to seeing the perf analysis

Kevin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MecuSorin , I shall close this PR, if you want to reopen it when you have time to provide some benchmarks that would be good.

Thanks for looking at this, and I look forward to seeing the perf analysis

Kevin

for (j, id2) in ids do
if j > i && id1.idText = id2.idText then
errorR (Duplicate(k, id1.idText, id1.idRange))))
errorR (Duplicate(k, id1.idText, id1.idRange))
elems


module TcRecdUnionAndEnumDeclarations =

let CombineReprAccess parent vis =
Expand Down