Tatoeba

CC BY 2.0 FR (some CC0)

Community sentences; includes Tanaka Corpus public domain data.

Japanese Wikipedia + Wikinews dumps

CC BY-SA 4.0 + GFDL (Wikinews CC BY 2.5)

Real-world encyclopedic and news-style prose.

Aozora Bunko

Public domain works with attribution rules

Literary sentences and classic prose.

JMdict / EDRDG

CC BY-SA 4.0 (EDRDG license terms)

Dictionary definitions, readings, parts of speech with share-alike.

Japanese WordNet

WordNet-style open license

Synonyms and Japanese definitions for question generation.