Skip to content

Hadoop "local mode" only works with <=100K records #6

@Alex-Ikanow

Description

@Alex-Ikanow

The issue appears to be that MongoInputFormat doesn't work on large datasets (too many mappers generated?), whereas InfiniteMongoInputFormat does (but backs off to MongoInputFormat if there are >8*12.5K documents).

The short term fix will be to force plugins to use InfiniteMongoInputFormat if in local mode, even if larger than the current limit (it's not like performance is going to be amazing in local mode anyway)

The workaround (from August OSS release) is to set "$splits" and "$docsPerSplit" (defaulting to 8 and 12500, see above) in the query object so that when multiplied they are larger than the number of documents to be processed.

This issue is recorded internally as INF-2018

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions