In the last two episodes of fun with python we went through having a simple python project with some standard tooling, then building it up to become an API executable in docker. For todays episode, I have decided to refactor it a bit, clean up smoke tests, add type checking and extend it a bit with extra attributes. As well as started to play a bit with pandas.
The project is nowhere complete and I still have a long list of things to improve. Like making that LessonFactory and LessonsInterface feel better used. I would also love to clean-up some spots where I have unnecessary code and extend pandas to work with Keras. The smoke tests also feel like something I could move to python and avoid the massive copy-paste based bash script.
There is never enough time though and after spending my 2h self-learning timebox per week on what I have now, it is time to wrap up and share learnings.
The project itself is here.
The first massive discovery was the use of pydantic in FastAPI. FastAPI itself is offering quite a few standard verifications on integer and string values. I was a bit sad that validator pattern is not that much out-of-the-box for FastAPI data objects. A few good options are listed in this thread. In the end, I have decided, that regex offers enough validation for my parameters.
data_url: Optional[str] = Field(
description="URL from which data will be pulled from",
example="data_url: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv",
min_length=MIN_URL_LEN,
max_length=MAX_URL_LEN,
regex="((http|https)://)(www.)?[a-zA-Z0-9@:%._\\+~#?&//=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%._\\+~#?&//=]*)",
)
action: Optional[str] = Field(
description="Actions possible to be executed on an uploaded data set",
example="action: describe",
max_length=MAX_STR_LEN,
regex="(describe|run)",
)
With this came a realization that the creator of the FastAPI prefers 422 http error code over 400. I’ve made a few amendments then into my custom code, when validating possible actions. For more read here. I really wanted to add proper error handling to the responses. It is not perfect, but now provides a much better visibility on what might have failed.
The next thing I really wanted to have is type-checking. Traditionally, types have been handled by the Python interpreter in a flexible but implicit way. Recent versions of Python allow you to specify explicit type hints that can be used by different tools to help you develop your code more efficiently. Albeit having the option does not mean it will be enforced. Type issues are a very common issue leading to production issues. This is why I have decided to use mypy. Once I had it in my pipeline, type checking is more enforced. Again, not each library you include will follow this guide, thus the need for mypy.ini skip checks (and occasional inline ignores).
In terms of refactoring, you might notice I have changed the ‘simpler’ – call a method from a dictionary to a lesson factory. This is an area I’d like to refactor in the future to hide the interfaces better. This felt like a better choice though.
Here is how it goes now:
from fastapi import status, HTTPException
from src.service.lessons.lesson_one import LessonOneInterface
from src.service.lessons.lesson_two import LessonTwoInterface
class LessonFactory:
def get_lesson(self, lesson_number: int):
if lesson_number == 1:
return LessonOneInterface()
elif lesson_number == 2:
return LessonTwoInterface()
else:
self._default(lesson_number)
# If user enters invalid option then this method will be called
def _default(self, lesson_number: int):
# following fast api choice of 422 over 400 --> https://github.com/tiangolo/fastapi/issues/643
raise HTTPException(
status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
detail=f"ERROR: This lesson does not exist yet lesson: {lesson_number}.",
)
factory = LessonFactory()
def execute_lesson(lesson_number: int, action: str, url: str):
interface = factory.get_lesson(lesson_number)
return interface.execute(action, url)
Then I needed to change the smoke tests from ‘just execute and show result’ to ‘actually check the result and build expectations’. This resulted in a bit of bash copy-paste fun-fest.
printf "\n\nTest 2 - lesson 1 can be retrieved?\n"
status=$(curl -X POST -H "Content-Type: application/json" -d '{"lesson_number": 1, "action": "run"}' -s -o /dev/null -w ''%{http_code}'' localhost:8000/lesson)
if [[ ! $status == "200" ]]
then
((fail++))
printf "NO - lesson 1 test failed.\n"
curl -X POST -H "Content-Type: application/json" \
-d '{"lesson_number": 1}' \
localhost:8000/lesson | json_pp -json_opt pretty,canonical
else
printf "YES - lesson 1 test successful.\n"
fi
Then I was finally ready to start adding new functionalities and lessons. At some point this year I’d like to use Keras and have an api that can provide some useful insights. What I did so far (within the timebox), was just finding an easy way to load data from a csv file, using pandas. And print out some properties of the data using describe.
Episode 3 will for now conclude my findings on playing with python. I always have a very long backlog of things to write about. I am more than happy though to get to know how things could have been done better!