Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions exercises/1901010077/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,11 @@
1 ѧ��һ��ܵĹؼ������Ƿ�����ڻ���ֻ���ڻ������������㡣

2 �����һ��ܣ���Ȼ�Ǽ��ܾ���Ҫÿ����ϰ������ѵ��Ӫ�������û������������ÿ�춼����ϰһ�δ��룬��ᵽ�˴ӱ�׾�������Ĺ��̡�

3 ����ѧ�������ա���ˢ���ı飬ÿһ�鶼���µ��ջ��������ƣ�Ϊʲô��һ�ο���ʱ��û�з��֣���Ȼ�����ܶ�飬�ǿ��������Ѳ��ԡ�

4 �ڿ�Ц����ʦ��ר����ͨ���Ƹ�����֮·��ʱ��֪�������ݷ���������ʱ�ͼᶨ��Ҫ�����ݷ����ĵ�·�����Լ�����Ŀ����ǣ�Ҫô����Ҫôѧ�����ݷ�����

5 ����ÿ�춼����ϰЦ����ʦ����Ʒ��֪�����������ź�Զ�ľ��룬����ڻ����ǹؼ������׹��������ù�ȥ���Լ�̫ɵ�ƣ��˷���̫�౦���ʱ�䣬���ǣ�������ã�Ψ��������ϰƵ�ʣ�ע�ؽ�����δ����þõ㣬��������ȷ�ķ�������ȷ�����顣

6 ϣ��ѵ��Ӫ�ܹ�Խ��Խ�á�
11 changes: 11 additions & 0 deletions exercises/1901010077/d09/mymodule/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import stats_word
import json
with open('tang300.json','r', encoding='UTF-8') as f: #不加'r', encoding='UTF-8'会报UnicodeDecodeError
t = f.read()
f.closed

try:
print('词频最高的前100个词:')
stats_word.stats_text_cn(t,100)
except ValueError as w:
print(w)
65 changes: 65 additions & 0 deletions exercises/1901010077/d09/mymodule/stats_word.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
t = '''
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambxiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
美丽胜过丑陋。
显式优于隐式。
简单比复杂更好。
复杂比复杂更好。
优于嵌套。
稀疏优于密集。
可读性很重要。
特殊情况不足以打破规则。
虽然实用性胜过纯洁。
错误不应该默默地传递。
除非明确沉默。
面对困惑,拒绝猜测的诱惑。
应该有一个 - 最好只有一个 - 明显的方法来做到这一点。
虽然这种方式起初可能并不明显,除非你是荷兰人。
现在比永远好。
虽然现在永远不会比*正确好。
如果实施很难解释,这是一个坏主意。
如果实现很容易解释,那可能是个好主意。
命名空间是一个很棒的主意 - 让我们做更多的事情吧!
'''
import re #引入正则表达式,以便操作字符串
import collections #引入collections模块,以便使用计数功能
def stats_text_en(t,count):
if type(t) != str:
raise ValueError('文本为非字符串')
t = re.sub("[^A-Za-z]", " ", t)
t = t.lower()
t = t.split()
t = collections.Counter(t).most_common(count)
print('英文单词词频: \n',t)

def stats_text_cn(t,count):
if type(t) != str:
raise ValueError('文本为非字符串')
t = re.sub("[A-Za-z.。,:'\'\ {},'!!“”「」??、:\"\-* \n]", "", t)
t = t.replace('\\','')
for t1 in t:
t1 = t.split()
t=collections.Counter(t).most_common(count)
print('中文汉字字频:\n',t)
def stats_text(t,count):
if type(t) != str:
raise ValueError('文本为非字符串')
return(stats_text_en(t,count),stats_text_cn(t,count))
2,235 changes: 2,235 additions & 0 deletions exercises/1901010077/d09/mymodule/tang300.json

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions exercises/1901010077/d10/mymodule/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import stats_word
import json
path = r'E:\python410\d10\mymodule\tang300.json'
with open(path,'r', encoding='UTF-8') as f:
t = f.read()

try:
print('词频最高的前20个词:\n',stats_word.stats_text_cn(t,20))
except ValueError as w:
print(w)
71 changes: 71 additions & 0 deletions exercises/1901010077/d10/mymodule/stats_word.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
t = '''
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambxiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
美丽胜过丑陋。
显式优于隐式。
简单比复杂更好。
复杂比复杂更好。
优于嵌套。
稀疏优于密集。
可读性很重要。
特殊情况不足以打破规则。
虽然实用性胜过纯洁。
错误不应该默默地传递。
除非明确沉默。
面对困惑,拒绝猜测的诱惑。
应该有一个 - 最好只有一个 - 明显的方法来做到这一点。
虽然这种方式起初可能并不明显,除非你是荷兰人。
现在比永远好。
虽然现在永远不会比*正确好。
如果实施很难解释,这是一个坏主意。
如果实现很容易解释,那可能是个好主意。
命名空间是一个很棒的主意 - 让我们做更多的事情吧!
'''
import re #引入正则表达式,以便操作字符串
import collections #引入collections模块,以便使用计数功能
import jieba
def stats_text_en(t,count):
if type(t) == str:
t = re.sub("[^A-Za-z]", " ", t)
t = t.lower()
t = t.split()
d = collections.Counter(t).most_common(count)
else:
raise ValueError('文本为非字符串')

def stats_text_cn(t,count):
if type(t) == str:
t = re.sub("[^\u4e00-\u9fa5]", "", t)
t1 = jieba.cut(t)
t2 = []
for i in t1:
if len(i) >= 2:
t2.append(i)
d = collections.Counter(t2).most_common(count)
return d
else:
raise ValueError('文本为非字符串')

def stats_text(t,count):
if type(t) == str:
return(stats_text_en(t,count),stats_text_cn(t,count))
else:
raise ValueError('文本为非字符串')
Loading